Completed Projects


  • Black Swan: The Popper Reproducibility Platform

    Fellow: Ivo Jimenez

    Duration: April 2019 - March 2021

    Abstract: Reproducibility is the cornerstone of the scientific method. Yet, in computational and data science domains, a gap exists between current practices and the ideal of having every new scientific discovery be easily reproducible. Advances in computer science (CS) and software engineering slowly and painfully make their way into these domains, even in (paradoxically) CS research. Popper is an experimentation protocol and CLI tool for implementing scientific exploration pipelines following a DevOps approach. The goal of Popper is to bring the same methods and tools used for the agile delivery of software to scientists and industry researchers.

  • Tracery/Chancery

    Fellow: Kate Compton

    Duration: April 2019 - September 2020

    Abstract: Tracery is a generative-text library and language implemented in Javascript. Its goal was to enable casual users (novice coders, but also those who do not ‘code’) to write simple JSON files that encodes grammar rules which produce complex recursively-expanded text. It was initially created as a class project at UCSC, then open-sourced. Tracery has been one of the biggest success stories in using open source software to support artists and poets. After the initial version was released in 2014, a British artist made a website, CheapBotsDoneQuick, to host bots written in the language. CheapBotsDoneQuick in turn created an artbot boom, with more than ten thousand bots currently hosted (see also Report on completion of this project here.

  • The NIMBLE Environment for Statistical Computing

    Fellow: Claudia Wehrhahn

    Duration: June 2017 - June 2018

    Many challenges in data science benefit from increasingly sophisticated statistical models. NIMBLE is becoming increasingly popular but is in need of open-source software leadership that drives its adaptation to parallel and distributed infrastructures as well as in-storage computing environments. The incubator team released NIMBLE with its first BNP modeling tools which included more efficient algorithms for the 4 level hierarchical model.

  • Research

  • Live Hardware Development (LiveHD): A productive infrastructure for Synthesis and Simulation

    Graduated 2022

    Fellow: Sheng Hong Wang (advisor: Jose Renau)

    Abstract: There is a resurgence in hardware accelerators due to power and performance constraints. At the same time, there is a resurgence in new Hardware Description Languages (HDLs). Many researchers see Verilog as the equivalent to the assembly in hardware specification, and they are creating new Hardware Description Languages to increase the abstraction. The goal of this project is to build a Multi-Language Synthesis and Simulation Infrastructure (MLSSI). MLSSI is the equivalent of a compiler infrastructure but for synthesizable languages like CHISEL, synthesizable Verilog, and Pyrope.

  • CAvSAT: A System for Query Answering over Inconsistent Databases

    Graduated 2021

    Fellow: Akhil Dixit (advisor: Phokion Kolaitis)

    Abstract: Managing inconsistencies in databases is an old, but recurring, problem. An inconsistent database is a database that violates one or more integrity constraints, such as key constraints or inclusion dependencies. Inconsistent databases arise in several different contexts, including information integration, where dealing with inconsistency is regarded as a key challenge. Consistent Query Answering (CQA) is a principled and scientific approach for answering queries over inconsistent databases. The CAvSAT (Consistent Answers via Satisfiability) aims to build a scalable and comprehensive consistent query answering system over inconsistent databases.

  • Mantle: A Programmable Metadata Load Balancer for the Ceph File System

    Completed Fall 2018

    Fellow: Michael Sevilla (PI: Carlos Maltzahn)

    Mantle is a programmable metadata balancer that separates the metadata migration policies from their mechanisms. The features and APIs are implemented on CephFS. The project team used Mantle to study how to manage and migrate file system metadata to improve performance. To achieve better load balancing, the project focused on the overheads of file system metadata protocols. The project lead chose to not continue with the work as an incubator and instead took a position in industry. Although not working on the project full-time, he continues to contribute to the project and support the existing community of contributors.

  • ZLog & CruzDB: Distributed Shared-log for Software-defined Storage

    Completed Fall 2018

    Fellow: Noah Watkins (PI: Carlos Maltzahn)

    CORFU is a fast shared log approach that leverages flash devices. In this project the research team implemented CORFU on Ceph and investigated the benefits of including this log abstraction into software-defined storage, including the management of shared logs across multiple storage tiers. The project fellow graduated in June 2018 and began working in industry, having chosen to not transition this project into an incubator. However, he continued to support contributions to ZLog – including acting as the head mentor for CROSS's 2018 GSoC student Javier Ron. The ZLog team prototyped the dynamic storage in the reporting period with full integration into ZLog in October 2018. Although the project fellow is no longer working full-time on the project, he intends to continue to contribute to ZLog and CruzDB, and support and promote community development around the project.

  • Strong Consistency in Dynamic Wireless Networks to Enable Safe and Efficient Navigation of Autonomous Vehicle

    Completed Summer 2018

    Fellow: Brendan Short (PI: Ricardo Sanfelice)

    Collaboration with the Hybrid Systems Laboratory

    The objectives of this project was to determine consistency requirements of distributed systems with smart storage devices over realistic networks and design algorithms that assure needed consistency, and to develop an open source software to implement the algorithms in the context of safe operation of autonomous vehicles to provide consistent data. It is believed that future services like unmanned-traffic management will depend upon strong consistency for safe operation. These services will often need to be provided over lossy wireless networks with limited bandwidth, where partitions may be frequent. We studied the consistency requirements of distributed systems with smart storage devices over realistic networks. In particular, the distributed systems will implement algorithms that require large amounts of (dynamically changing) data that is available to all systems via a shared log. The problem of navigation of autonomous systems will serve as the prototype application of this research.

  • Memory and Storage Coordinative Lifetime Enhancement with Near-Data Computing

    Completed Summer 2017

    Fellow: Xiao Liu (PI: Jishen Zhao)

    This project aimed to design a DRAM/NVRAM hybrid memory system, which offers scalable performance and resiliency in data center servers and address memory system scalability by developing a DRAM/NVRAM hybrid memory system. The design adopted several gigabytes of DRAM; the main data storage component is NVRAM. As such, scalability issues associated with DRAM -- the performance and energy overhead of refresh and ECC – can be substantially mitigated. The project also explored hybrid memory management mechanisms that provide a unified memory space for persistent and non-persistent data structures.

  • An Efficient C Library for Unum 2.0

    Work ended Summer 2017

    Fellow: Andrew Klofas (PI: Nic Brummell, Carlos Maltzahn)

    The universal number (unum) is a new digital numerical system for computers. Computation with unums enables higher accuracy math by increasing information density. In order to make widespread adoption possible, this project created an open source C library that includes a framework for optimizing its efficiency on different architectures. The project team implemented basic arithmetic operations (addition, subtraction, multiplication, reciprocation, divide) and began adding an unum 2.0 matrix library