Current Projects

CROSS is organized around incubator and research projects.

Software related to CROSS projects can be found in the CROSS Software Portal.


    Incubator Projects

  • SkyhookDM: Programmable Storage for Databases

    Fellow: Jeff LeFevre

    Abstract: The cloud business model requires flexible resource usage but traditional relational databases strongly couple data to physical resources making it difficult to add and remove database nodes. While skyhook is not a database itself, it is an enabling technology that takes some of the metadata management and data processing tasks normally handled by the DBMS and delegates them to the storage system.  This approach is immediately useful to enable smaller/single node databases growing to much larger sizes, and the project team identified this as a point of interest within the Postgres community, which is currently limited to storing database table files on local disk.  Their current options are to replace local disk with perhaps RAID arrays or migrate entirely to the cloud where they can rent Postgres instances.  However, both of these approaches still require the single node Postgres instance to do all of the actual DMBS work.  By pushing some of these capabilities from the DBMS into the storage, skyhook enables a single node Postgres instance to scale (in-part) with the amount of storage added.  These storage capabilities are the new focus of skyhook (see also skyhookdm.com).

  • Black Swan: The Popper Reproducibility Platform

    Fellow: Ivo Jimenez

    Abstract: Reproducibility is the cornerstone of the scientific method. Yet, in computational and data science domains, a gap exists between current practices and the ideal of having every new scientific discovery be easily reproducible. Advances in computer science (CS) and software engineering slowly and painfully make their way into these domains, even in (paradoxically) CS research. Popper is an experimentation protocol and CLI tool for implementing scientific exploration pipelines following a DevOps approach. The goal of Popper is to bring the same methods and tools used for the agile delivery of software to scientists and industry researchers.


  • Research Projects

  • Live Hardware Development (LiveHD): A productive infrastructure for Synthesis and Simulation

    Fellow: Sheng Hong Wang (advisor: Jose Renau)

    Abstract: There is a resurgence in hardware accelerators due to power and performance constraints. At the same time, there is a resurgence in new Hardware Description Languages (HDLs). Many researchers see Verilog as the equivalent to the assembly in hardware specification, and they are creating new Hardware Description Languages to increase the abstraction. The goal of this project is to build a Multi-Language Synthesis and Simulation Infrastructure (MLSSI). MLSSI is the equivalent of a compiler infrastructure but for synthesizable languages like CHISEL, synthesizable Verilog, and Pyrope.

  • CAvSAT: A System for Query Answering over Inconsistent Databases

    Fellow: Akhil Dixit (advisor: Phokion Kolaitis)

    Abstract: Managing inconsistencies in databases is an old, but recurring, problem. An inconsistent database is a database that violates one or more integrity constraints, such as key constraints or inclusion dependencies. Inconsistent databases arise in several different contexts, including information integration, where dealing with inconsistency is regarded as a key challenge. Consistent Query Answering (CQA) is a principled and scientific approach for answering queries over inconsistent databases. The CAvSAT (Consistent Answers via Satisfiability) aims to build a scalable and comprehensive consistent query answering system over inconsistent databases.

  • Eusocial Storage Devices

    Fellow: Jianshen Liu (advisor: Carlos Maltzahn)

    Abstract: As storage devices get faster, data management tasks rob the host of CPU cycles and main memory bandwidth. Eusocial storage is a new media device API definition that drives data management activities into the device and sets a course towards in-storage computing functionality. It takes into account today’s storage scale requirements and builds on top of them. Although there are many benefits of offloading data management to the storage device (e.g. software layer reduction, data translation reduction, higher abstraction levels, etc), the one liability is that the extra processing required in the storage device increases the cost of the device. However, an increase in that system component cost does not mean the overall system cost increases. The offloading of data management tasks should reduce costs in other areas. The first project undertaken with Eusocial Storage is to reproducibly quantify the benefits of offloading to the overall system.

  • Mapping datasets to object storage

    Fellow: Aaron Chu (advisor: Carlos Maltzahn)

    Abstract: Access libraries such as HDF5 allow users to interact with datasets using a high level abstraction. But the implementations of access libraries are based on outdated assumptions about storage systems interfaces and generally do not scale. In this research project we explore distributed dataset mapping infrastructures that can integrate and scale out important existing access libraries using programmable storage abstractions available in Ceph while avoiding reimplementation or even modifications of these access libraries as much as possible. Such a distributed dataset mapping infrastructure will allow operations of access libraries to be offloaded to storage system servers (or devices) and fully leverage load balancing, elasticity, and failure management of distributed storage systems like Ceph.

  • OSAVC: Open Source Autonomous Vehicle Controller

    FellowAaron Hunter (advisor: Gabriel Elkaim)

    Abstract: The field of autonomous vehicles is a rich field for research for nearly every conceivable environment, aerial, marine, terrestrial and even extraterrestrial. With the availability of more powerful processors, onboard intelligence capabilities have advanced, opening up new possibilities for decision making and sensing in autonomous vehicles. The OSAVC project is an open source hardware and software project that provides the link between a real time control and intelligent decision making.

  • Managing Bufferbloat in Storage Systems

    Fellow: Esmaeil Mirvakili (Advisor: Carlos Maltzahn)

    Abstract: Scalable storage servers consist of multiple parts that communicate asynchronously via queues. There is usually a frontend that queues access requests from storage clients and uses one or more threads to forward queued requests to a backend. The backend queues these forwarded requests and batches them to efficiently use storage devices it manages. Storage servers can have multiple kinds of backends with different design assumptions about their underlying storage device technologies. Requests are scheduled in the frontend to ensure different levels of service for different classes of requests. For example, requests that are generated by data scrubbers working in the background generally have a lower priority than requests from an application. A common solution to the above problem is to move request scheduling from the frontend to the backend. For various reasons that is not always practical. The scope of this project is to have the scheduler reside in the frontend and to explore designs for backends to dynamically control the admission of requests depending on continually changing workloads and storage device technologies.


CROSS issues call for proposals twice a year (usually in late Fall/early Winter and in late Spring/early Summer).