The NIMBLE Environment for Statistical Computing
Many challenges in data science benefit from increasingly sophisticated statistical models. NIMBLE is becoming increasingly popular but is in need of open-source software leadership that drives its adaptation to parallel and distributed infrastructures as well as in-storage computing environments.
Start Date: Summer ’17
The cloud business model requires flexible resource usage but traditional relational databases strongly couple data to physical resources making it difficult to add and remove database nodes. The Skyhook project extends PostgreSQL with a data/resource decoupling that allows dynamic expansion and shrinking of database clusters and enables the query optimizer to leverage this functionality.
Start Date: Late Fall’16
STARTING IN EARLY 2019
Tracery2 and Chancery
Black Swan: The Popper Reproducibility Platform
Synopsis: Reproducibility is the cornerstone of the scientific method. Yet, in computational and data science domains, a gap exists between current practices and the ideal of having every new scientific discovery be easily reproducible. Advances in computer science (CS) and software engineering slowly and painfully make their way into these domains, even in (paradoxically) CS research. Popper (http://falsifiable.us) is an experimentation protocol and CLI tool for implementing scientific exploration pipelines following a DevOps (https://en.wikipedia.org/wiki/Devops) approach. The goal of Popper is to bring the same methods and tools used for the agile delivery of software (DevOps) to scientists and industry researchers.