PyTables
NumFOCUS Sponsored Project since 2016PyTables is an efficient method for storing and querying both numerical and textual data. PyTables provides seamless access to the convenient HDF5 library, a popular container for datasets that can grow to terabytes and beyond. With its support of the ultra-fast Blosc compressor, PyTables optimizes memory and disk resources so that data takes up far less space than other solutions, without allowing compression to slow down your data management.
Share This Project:
Industry
Business & Industry Applications
Language
Python
Features
High Performance Computing
Big Data
Data Mining
PyTables is a Python package for storing and querying large tabular datasets in an efficient way. PyTables is built on top of the HDF5 library and the NumPy and numexpr packages; these provide the foundations for very compact storage and high performance data management. Moreover, PyTables comes with OPSI, an indexing engine meant to work with datasets exceeding the RAM capacity while allowing query times to be competitive against engines in relational databases. Finally, PyTables comes with the high-speed Blosc compressor, making the overhead of compression typically negligible in terms of performance (and many times even beneficial) when dealing with large datasets, even when they are in-memory.
PyTables has been used in a variety of both academic and industry contexts, including at: CalTech, the NASA Jet Propulsion Lab at CalTech, Universitat Politècnica de València, University of Southampton School of Engineering Sciences, Max Planck Gesellschaft, SLAC, ACUSIM Software, NOAA, General Dynamics, Germanischer Lloyd, SarVision, cellzome, and TeraView.