Friday 9:00–10:30 in Tower Suite 3

Data Analysis in Parallel

Filip Ter

Audience level:
Intermediate

Description

This tutorial will demonstrate how to efficiently use IPyParallel, to benefit from parallelism in the early stages of data analysis and model development. The aim is to demystify parallelism for analysts and researchers, so that they can start using it early on in their workflow. Examples of common tasks will be shown in Jupyter, and how they can be easily run in parallel without major disruption.

Abstract

Description

Researchers, data scientists, and others; often encounter problems, which are parallel in nature. Despite this, many find it difficult to take advantage of the potential speedup afforded by parallelism, especially in the early stages of model development and data analysis. In such cases, one may write sequential code that only deals with a chunk of the data at a time, and then the parallelism is implemented once the code is moved into production. This approach is not ideal, as the original code written by the researcher will be slower, and may not map well onto the resulting production code. Thanks to technologies like IPyParallel it has become much easier to use parallelism in research, and this tutorial will demonstrate how that can be done.

Abstract

Using a Jupyter notebook, several examples of data analysis and model fitting tasks will be demonstrated. Then the next step will involve showing how they can be implemented in parallel with relatively small changes to the logic of the code.

After the tutorial, the participants should have an idea of how to parallelize their analysis, and feel confident to try it for themselves. They should see how using IPyParallel in their research can benefit both them and those who will have to put their code into production later on. They should also have an understanding of the limitations of this process and differentiate between the problems where this can and cannot be used.

The materials will all be uploaded to Github in the days prior to the tutorial so the participants can use it to follow along better.

Target audience: The audience must be very comfortable programming in Python to be able to follow the examples. They must also have basic knowledge of data analysis and model fitting, to be able to grasp the context of the tutorial.

Subscribe to Receive PyData Updates

Subscribe