Thursday 1:20 PM–2:00 PM in Central Park East (#6501a)

Intake - taking the pain out of data access

Martin Durant

Audience level:
Novice

Description

Intake is a simple library providing a single interface for cataloging, describing and reading any kind of data. Catalogs give end-users an easy way to find data, locally, in a cloud service, or on an Intake server. Thus, Intake separates the definition of data sources from their use and analysis, so that Data Engineers and Data Scientists can get on with their respective jobs.

Abstract

Defining and loading data-sets costs time and effort. The data scientist needs to know what data are available, and the characteristics of each data-set, before going to the effort of loading and beginning to analyze a specific data-set. Furthermore, they might need to learn the API of some Python package specific to the target format. The code to do such data loading often makes up the first block of every notebook or script, propagated by copy&paste.

Intake has been designed as a simple layer over other Python libraries to provide:

For a simple design and relatively small code-base, there are lots of features. We will demonstrate the main ones and show typical work-flows from two points of view:

Thus, Intake provides a very simple yet useful division between the users of data, and the maintainers of data source catalogs. Intake has approachable code and is extensible in many places, and so hopefully can progress to become an all-inclusive data ecosystem for numerical Python.

Subscribe to Receive PyData Updates

Subscribe