Friday 12:00–14:00 in Intermediate

High Performance Python for Data Analysis.

Guillem Borrell Nogueras

Audience level:
Intermediate

Description

Writing fast and memory efficient code in Python requires some experience, and some best practices are counter-intuitive if you come from static languages like C++ and Java. This live coding session covers the reasons why Python may be slow and how to write efficient and fast Python applications, with special emphasis in profiling, efficient use of Numpy, and extensions with Numba and Cython.

Abstract

This tutorial covers some of the performance issue found while introducing python to a fairly large development team with roots in Java. All the examples are based on real-world performance problems that were found during the last year. Some of them are fairly trivial, but they are useful to explain why Python is slow, at least compared to other programming languages.

It's the same old story. One engineer designs a functionality for a client with test data, but once the real-world data arrives, the implementation is too slow. The code has to be refactored, but in the most isolated way possible. In addition, the client may have some constraints like not being able to compile extensions. The engineer calls someone more experienced in Python and the research begins. This workshop will be useful if you want to know some shortcuts.

Even Pandas, that has a fairly good performance, can be seriously improved with some profiling, advanced Numpy tricks, Numexpr, the numba JIT compiler, or Cython extensions. In many of the examples the speedup is of two orders of magnitude.

The syllabus will be the following:

Subscribe to Receive PyData Updates

Tickets

Get Now