Industry
Business & Industry Applications
Language
Python
Java
C
Java
C
Features
High Performance Computing
Big Data
Numerical Computing
Data Mining
Blosc is a high performance compression library written in C that uses the blocking technique to make compression operations easier, faster, and more flexible. Blosc splits the datasets into blocks, then transparently packs them into compressed containers. Blosc extends standard compression by allowing the user to condition the data in every block with filters prior to the compression operation; these filters can be selected depending on the properties of the dataset to be compressed. In addition, Blosc provides a diversity of codecs to cover different needs (better compression, faster speed or a balance between the two). Finally, Blosc runs its operations in parallel to leverage the high number of cores in modern CPUs.
Blosc is designed to help in any application where (binary) data needs to be compressed as fast as possible so as to minimize the impact in handling compressed data transparently.
Blosc is currently used in many projects, among them PyTables (there is also a stand-alone hdf5-blosc plugin for general HDF5 applications), bcolz, and zarr (via its numcodecs subproject) and it is widely appreciated for its fast performance and stability. Via these projects Blosc is used in Earth Data Science, databases, the financial industry and many others. Hundreds of visitors access the different Blosc projects in GitHub on a daily basis.