Tuesday 10:05 AM–10:45 AM in Winter Garden (5412)

Improve the efficiency of your Big Data application

Francesc Alted, Christian Steiner

Audience level:
Intermediate

Description

If you're using NumPy and your data uses too much memory or requires too much computational resources this talk is for you! We'll introduce Caterva and IronArray, two libraries that, when used together, can greatly improve the efficiency and reduce the cost of your big data applications.

Abstract

Our libraries feature a novel approach to store and process data compressed in-memory to achieve low-memory consumption while maintaining high-performance.

Caterva: Built on top of C-Blosc2, implements a simple multidimensional container for compressed data. It adds the capability to store, extract, and transform compressed data in these containers, either in-memory or on-disk.

IronArray: Built on top of Caterva and C-Blosc2, adds type-safety as well as a computational engine, so that matrix and vector calculations are performed efficiently on top of compressed and multidimensional containers.

Caterva

While there are several existing solutions for storing compressed data (HDF5 is one of the most well known examples), Caterva brings the following novel features which set it apart from its competitors:

IronArray

IronArray implements a computational engine that is optimized to deal with compressed data. IronArray adds type definitions to Caterva containers and takes every measure to reduce the compression overhead to seamlessly perform calculations on these; its ultimate goal is to be able to perform computations on compressed containers at the same speed than by using uncompressed containers.

During our talk, we will introduce Caterva and IronArray features by using cat4py, a Python wrapper for Caterva and IronArray for Python.

Subscribe to Receive PyData Updates

Subscribe