Thursday October 28 11:30 AM – Thursday October 28 12:00 PM in Talks II

Introducing Blosc2, the next generation of the Blosc compressor

Francesc Alted

Prior knowledge:
No previous knowledge expected

Summary

Blosc2 (https://www.blosc.org) is a high-performance compressor library and format meant for binay data. The core is made in C for speed, with a Python wrapper available. This talk will introduce you to the main features of Blosc2, like (persistent) 64-bit frames, the filter pipeline, SIMD support for ARM and PowerPC, metalayers, plugin capabilities for filters and codecs and more.

Description

Blosc2 is the successor of the well known Blosc compressor, and after 5 years in the works, it reached production stage in past June. This talk will serve as a gentle but informative introduction of the library to the general audience.

Blosc2 brings in many new features, but perhaps the most outstanding one is the new frame format. The frame allows to overcome the 31-bit limitation for buffer sizes in the previous Blosc version, but it also provides support for persistence and, most specially, the capability to add metalayers that can complement existing data with different sets of metadata, allowing to easily define different formats on top of Blosc2 for a variety of purposes; I will describe some of those formats later on.

The talk will start by describing what a frame is, and the kind of advantages that it brings. Then it will follow some other important new features of Blosc2 using, when possible, the official python-blosc2 wrapper. I will finish by showing off different applications that have been implemented on top of Blosc2, as for example, Caterva, a compressed multidimensional array. Another example will be ironArray Community, a thin layer on top of Caterva which adds data types.