PyData Barcelona 2017 - Presentation: High Performance Python for Data Analysis.

Writing fast and memory efficient code in Python requires some experience, and some best practices are counter-intuitive if you come from static languages like C++ and Java. This live coding session covers the reasons why Python may be slow and how to write efficient and fast Python applications, with special emphasis in profiling, efficient use of Numpy, and extensions with Numba and Cython.

This tutorial covers some of the performance issue found while introducing python to a fairly large development team with roots in Java. All the examples are based on real-world performance problems that were found during the last year. Some of them are fairly trivial, but they are useful to explain why Python is slow, at least compared to other programming languages.

It's the same old story. One engineer designs a functionality for a client with test data, but once the real-world data arrives, the implementation is too slow. The code has to be refactored, but in the most isolated way possible. In addition, the client may have some constraints like not being able to compile extensions. The engineer calls someone more experienced in Python and the research begins. This workshop will be useful if you want to know some shortcuts.

Even Pandas, that has a fairly good performance, can be seriously improved with some profiling, advanced Numpy tricks, Numexpr, the numba JIT compiler, or Cython extensions. In many of the examples the speedup is of two orders of magnitude.

The syllabus will be the following:

Why (stock) Python is slow. Deal with it.
Why Numpy is fast, if you use it properly. Numexpr may help sometimes.
Why Cython and Numba are fast, and how to keep them fast.
Why dates are hard and slow, but there are ways to deal with them.
Why O(2) computations in Pandas are slow, but if you are careful with allocations, they're not a great deal.

Friday 12:00–14:00 in Intermediate

High Performance Python for Data Analysis.

Guillem Borrell Nogueras

Description

Abstract

Subscribe to Receive PyData Updates

Tickets