CROSS Incubator SkyhookDM Now Part of Apache Arrow

On October 22, 2021, three reviewers formally approved the merge of the CROSS project SkyhoodDM into the Apache Arrow mainline. SkyhookDM will be part of the Arrow 7.0.0 release.

October 25, 2021

By Jayjeet Chakraborty 


The CROSS supported Skyhook Data Management (SkyhookDM) is now officially a part of the Apache Arrow project mainline and is planned to be included in release 7.0.0.

SkyhookDM is a plugin for offloading computations involving data processing operations into the storage layer of distributed and programmable object storage systems and is being developed and maintained by CROSS researchers. The goal of Skyhook is to reduce client-side resource utilization in terms of CPU, memory bandwidth, and network utilization by offloading data management and processing tasks to the storage layer. The project team use Ceph, a petabyte-scale distributed object storage system as the storage layer for Skyhook since it provides an excellent object-store extension mechanism with its Object class SDK. On the client side, SkyhookDM use the Arrow Dataset API to expose the functionality. The implementation is within Ceph but is not Ceph specific, rather it is applicable to any storage system with similar programmability features such as user-defined object classes and partial read/write of objects.


For more on this story please see the full blog post.