Thursday 17:10–17:40 in Track 2

The Python Ecosystem for Data Science: A Guided Tour

Christian Staudt

Audience level:
Novice

Description

Pythonistas have access to an extensive collection of tools for data analysis. The space of tools is best understood as an ecosystem: Libraries build upon each other, and a good library fills an ecological niche by doing certain jobs well. This is a guided tour of the Python data science ecosystem, aiming to help us select the right stack for our next data-driven project.

Abstract

Python is on its way to becoming the lingua franca of data science, and Pythonistas have access to an impressive and extensive collection of tools for data analysis. Here, a data scientist needs to see the forest for the trees: The space of tools is best understood as an ecosystem, where libraries build upon each other, and where a good library fills an ecological niche by doing certain jobs well. This talk is a guided tour of the Python data science ecosystem. More than a list of libraries, it aims to provide some structure, classing tools by type of data, size of data, and type of analysis. In our tour, we visit a number of areas, including working with tabular data (numpy, pandas, dask, ...) and graph data (e.g. networkx), statistics (e.g. statsmodels), machine learning (scikit-learn, ...), data visualization (matplotlib, seaborn, bokeh, ...). Aspiring data scientists, and everyone else working with data, should find this useful for selecting the right tools for their next data-driven project.

Subscribe to Receive PyData Updates

Tickets

Get Now