Completed Projects
Black Swan: The Popper Reproducibility Platform
Fellow: Ivo Jimenez
Duration: April 2019 - March 2021
Abstract: Reproducibility is the cornerstone of the scientific method. Yet, in computational and data science domains, a gap exists between current practices and the ideal of having every new scientific discovery be easily reproducible. Advances in computer science (CS) and software engineering slowly and painfully make their way into these domains, even in (paradoxically) CS research. Popper is an experimentation protocol and CLI tool for implementing scientific exploration pipelines following a DevOps approach. The goal of Popper is to bring the same methods and tools used for the agile delivery of software to scientists and industry researchers.Tracery/Chancery
Fellow: Kate Compton
Duration: April 2019 - September 2020
Abstract: Tracery is a generative-text library and language implemented in Javascript. Its goal was to enable casual users (novice coders, but also those who do not ‘code’) to write simple JSON files that encodes grammar rules which produce complex recursively-expanded text. It was initially created as a class project at UCSC, then open-sourced. Tracery has been one of the biggest success stories in using open source software to support artists and poets. After the initial version was released in 2014, a British artist made a website, CheapBotsDoneQuick, to host bots written in the language. CheapBotsDoneQuick in turn created an artbot boom, with more than ten thousand bots currently hosted (see also tracery.io). Report on completion of this project here.
The NIMBLE Environment for Statistical Computing
Fellow: Claudia Wehrhahn
Duration: June 2017 - June 2018
Many challenges in data science benefit from increasingly sophisticated statistical models. NIMBLE is becoming increasingly popular but is in need of open-source software leadership that drives its adaptation to parallel and distributed infrastructures as well as in-storage computing environments. The incubator team released NIMBLE with its first BNP modeling tools which included more efficient algorithms for the 4 level hierarchical model.
Eusocial Storage Devices
Graduated 2023
Fellow: Jianshen Liu (advisor: Carlos Maltzahn)
Abstract: As storage devices get faster, data management tasks rob the host of CPU cycles and main memory bandwidth. Eusocial storage is a new media device API definition that drives data management activities into the device and sets a course towards in-storage computing functionality. It takes into account today’s storage scale requirements and builds on top of them. Although there are many benefits of offloading data management to the storage device (e.g. software layer reduction, data translation reduction, higher abstraction levels, etc), the one liability is that the extra processing required in the storage device increases the cost of the device. However, an increase in that system component cost does not mean the overall system cost increases. The offloading of data management tasks should reduce costs in other areas. The first project undertaken with Eusocial Storage is to reproducibly quantify the benefits of offloading to the overall system.
OSAVC: Open Source Autonomous Vehicle Controller
Graduated 2023
Fellow: Aaron Hunter (advisor: Gabriel Elkaim)
Abstract: The field of autonomous vehicles is a rich field for research for nearly every conceivable environment, aerial, marine, terrestrial and even extraterrestrial. With the availability of more powerful processors, onboard intelligence capabilities have advanced, opening up new possibilities for decision making and sensing in autonomous vehicles. The OSAVC project is an open source hardware and software project that provides the link between a real time control and intelligent decision making.
Live Hardware Development (LiveHD): A productive infrastructure for Synthesis and Simulation
Graduated 2022
Fellow: Sheng Hong Wang (advisor: Jose Renau)
Abstract: There is a resurgence in hardware accelerators due to power and performance constraints. At the same time, there is a resurgence in new Hardware Description Languages (HDLs). Many researchers see Verilog as the equivalent to the assembly in hardware specification, and they are creating new Hardware Description Languages to increase the abstraction. The goal of this project is to build a Multi-Language Synthesis and Simulation Infrastructure (MLSSI). MLSSI is the equivalent of a compiler infrastructure but for synthesizable languages like CHISEL, synthesizable Verilog, and Pyrope.
CAvSAT: A System for Query Answering over Inconsistent Databases
Graduated 2021
Fellow: Akhil Dixit (advisor: Phokion Kolaitis)
Abstract: Managing inconsistencies in databases is an old, but recurring, problem. An inconsistent database is a database that violates one or more integrity constraints, such as key constraints or inclusion dependencies. Inconsistent databases arise in several different contexts, including information integration, where dealing with inconsistency is regarded as a key challenge. Consistent Query Answering (CQA) is a principled and scientific approach for answering queries over inconsistent databases. The CAvSAT (Consistent Answers via Satisfiability) aims to build a scalable and comprehensive consistent query answering system over inconsistent databases.
Mantle: A Programmable Metadata Load Balancer for the Ceph File System
Completed Fall 2018
Fellow: Michael Sevilla (PI: Carlos Maltzahn)
Mantle is a programmable metadata balancer that separates the metadata migration policies from their mechanisms. The features and APIs are implemented on CephFS. The project team used Mantle to study how to manage and migrate file system metadata to improve performance. To achieve better load balancing, the project focused on the overheads of file system metadata protocols. The project lead chose to not continue with the work as an incubator and instead took a position in industry. Although not working on the project full-time, he continues to contribute to the project and support the existing community of contributors.
ZLog & CruzDB: Distributed Shared-log for Software-defined Storage
Completed Fall 2018
Fellow: Noah Watkins (PI: Carlos Maltzahn)
CORFU is a fast shared log approach that leverages flash devices. In this project the research team implemented CORFU on Ceph and investigated the benefits of including this log abstraction into software-defined storage, including the management of shared logs across multiple storage tiers. The project fellow graduated in June 2018 and began working in industry, having chosen to not transition this project into an incubator. However, he continued to support contributions to ZLog – including acting as the head mentor for CROSS's 2018 GSoC student Javier Ron. The ZLog team prototyped the dynamic storage in the reporting period with full integration into ZLog in October 2018. Although the project fellow is no longer working full-time on the project, he intends to continue to contribute to ZLog and CruzDB, and support and promote community development around the project.
Strong Consistency in Dynamic Wireless Networks to Enable Safe and Efficient Navigation of Autonomous Vehicle
Completed Summer 2018
Fellow: Brendan Short (PI: Ricardo Sanfelice)
Collaboration with the Hybrid Systems Laboratory
The objectives of this project was to determine consistency requirements of distributed systems with smart storage devices over realistic networks and design algorithms that assure needed consistency, and to develop an open source software to implement the algorithms in the context of safe operation of autonomous vehicles to provide consistent data. It is believed that future services like unmanned-traffic management will depend upon strong consistency for safe operation. These services will often need to be provided over lossy wireless networks with limited bandwidth, where partitions may be frequent. We studied the consistency requirements of distributed systems with smart storage devices over realistic networks. In particular, the distributed systems will implement algorithms that require large amounts of (dynamically changing) data that is available to all systems via a shared log. The problem of navigation of autonomous systems will serve as the prototype application of this research.
Memory and Storage Coordinative Lifetime Enhancement with Near-Data Computing
Completed Summer 2017
Fellow: Xiao Liu (PI: Jishen Zhao)
This project aimed to design a DRAM/NVRAM hybrid memory system, which offers scalable performance and resiliency in data center servers and address memory system scalability by developing a DRAM/NVRAM hybrid memory system. The design adopted several gigabytes of DRAM; the main data storage component is NVRAM. As such, scalability issues associated with DRAM -- the performance and energy overhead of refresh and ECC – can be substantially mitigated. The project also explored hybrid memory management mechanisms that provide a unified memory space for persistent and non-persistent data structures.
An Efficient C Library for Unum 2.0
Work ended Summer 2017
Fellow: Andrew Klofas (PI: Nic Brummell, Carlos Maltzahn)
The universal number (unum) is a new digital numerical system for computers. Computation with unums enables higher accuracy math by increasing information density. In order to make widespread adoption possible, this project created an open source C library that includes a framework for optimizing its efficiency on different architectures. The project team implemented basic arithmetic operations (addition, subtraction, multiplication, reciprocation, divide) and began adding an unum 2.0 matrix library