Current Projects

CROSS is organized around incubator and research projects.

Software related to CROSS projects can be found in the CROSS Software Portal.


    Incubator Projects

  • SkyhookDM: Programmable Storage for Databases

    Fellow: Jeff LeFevre

    Abstract: The cloud business model requires flexible resource usage but traditional relational databases strongly couple data to physical resources making it difficult to add and remove database nodes. While skyhook is not a database itself, it is an enabling technology that takes some of the metadata management and data processing tasks normally handled by the DBMS and delegates them to the storage system.  This approach is immediately useful to enable smaller/single node databases growing to much larger sizes, and the project team identified this as a point of interest within the Postgres community, which is currently limited to storing database table files on local disk.  Their current options are to replace local disk with perhaps RAID arrays or migrate entirely to the cloud where they can rent Postgres instances.  However, both of these approaches still require the single node Postgres instance to do all of the actual DMBS work.  By pushing some of these capabilities from the DBMS into the storage, skyhook enables a single node Postgres instance to scale (in-part) with the amount of storage added.  These storage capabilities are the new focus of skyhook (see also skyhookdm.com).

  • Tracery2 and Chancery

    Fellow: Kate Compton

    Abstract: Tracery is a generative-text library and language implemented in Javascript. Its goal was to enable casual users (novice coders, but also those who do not ‘code’) to write simple JSON files that encodes grammar rules which produce complex recursively-expanded text. It was initially created as a class project at UCSC, then open-sourced. Tracery has been one of the biggest success stories in using open source software to support artists and poets. After the initial version was released in 2014, a British artist made a website, CheapBotsDoneQuick, to host bots written in the language. CheapBotsDoneQuick in turn created an artbot boom, with more than ten thousand bots currently hosted (see also tracery.io).
  • Black Swan: The Popper Reproducibility Platform

    Fellow: Ivo Jimenez

    Abstract: Reproducibility is the cornerstone of the scientific method. Yet, in computational and data science domains, a gap exists between current practices and the ideal of having every new scientific discovery be easily reproducible. Advances in computer science (CS) and software engineering slowly and painfully make their way into these domains, even in (paradoxically) CS research. Popper is an experimentation protocol and CLI tool for implementing scientific exploration pipelines following a DevOps approach. The goal of Popper is to bring the same methods and tools used for the agile delivery of software to scientists and industry researchers.


  • Research Projects

  • LGraph: Open Source Multi-Language Synthesis and Simulation Infrastructure

    Fellow: Sheng Hong Wang (advisor: Jose Renau)

    Abstract: There is a resurgence in hardware accelerators due to power and performance constraints. At the same time, there is a resurgence in new Hardware Description Languages (HDLs). Many researchers see Verilog as the equivalent to assembly in-ha rdware specification, and they are creating new Hardware Description Languages to increase the abstraction.The goal of this proposal is to build a Multi-Language Synthesis and Simulation Infrastructure (MLSSI). MLSSI is the equivalent of a compiler infrastructure but for synthesizable languages like CHISEL, synthesizableVerilog, and Pyrope.

  • CAvSAT: A System for Query Answering over Inconsistent Databases

    Fellow: Akhil Dixit (advisor: Phokion Kolaitis)

    Abstract: Managing inconsistencies in databases is an old, but recurring, problem. An inconsistent database is a database that violates one or more integrity constraints, such as key constraints or inclusion dependencies. Inconsistent databases arise in several different contexts, including information integration, where dealing with inconsistency is regarded as a key challenge. Consistent Query Answering (CQA) is a principled and scientific approach for answering queries over inconsistent databases. The CAvSAT (Consistent Answers via Satisfiability) aims to build a scalable and comprehensive consistent query answering system over inconsistent databases.

  • Eusocial Storage Devices

    Fellow: Jianshen Liu (advisor: Carlos Maltzahn)

    Abstract: As storage devices get faster, data management tasks rob the host of CPU cycles and main memory bandwidth. Eusocial storage is a new media device API definition that drives data management activities into the device and sets a course towards in-storage computing functionality. It takes into account today’s storage scale requirements and builds on top of them. Although there are many benefits of offloading data management to the storage device (e.g. software layer reduction, data translation reduction, higher abstraction levels, etc), the one liability is that the extra processing required in the storage device increases the cost of the device. However, an increase in that system component cost does not mean the overall system cost increases. The offloading of data management tasks should reduce costs in other areas. The first project undertaken with Eusocial Storage is to reproducibly quantify the benefits of offloading to the overall system.

  • Mapping datasets to object storage

    Fellow: Aaron Chu (advisor: Carlos Maltzahn)

    Abstract: Access libraries such as HDF5 allow users to interact with datasets using a high level abstraction. But the implementations of access libraries are based on outdated assumptions about storage systems interfaces and generally do not scale. In this research project we explore distributed dataset mapping infrastructures that can integrate and scale out important existing access libraries using programmable storage abstractions available in Ceph while avoiding reimplementation or even modifications of these access libraries as much as possible. Such a distributed dataset mapping infrastructure will allow operations of access libraries to be offloaded to storage system servers (or devices) and fully leverage load balancing, elasticity, and failure management of distributed storage systems like Ceph.



CROSS issues call for proposals twice a year (usually in late Fall/early Winter and in late Spring/early Summer).