Tuesday 1:10 p.m.–1:45 p.m.

Beating Python's GIL to Max Out Your CPUs

Andrew Montalenti

Audience level:
Intermediate

Description

Among the #1 complaints of Python in a data analysis context is the presence of the Global Interpreter Lock, or GIL. At its core, it means that a given Python program cannot easily utilize more than one core of a multi-core machine to do computation in parallel. However, fear not! To beat the GIL, you just need to be willing to adopt a little magic -- and this talk will tell you how.

Abstract

Beating Python's Global Interpreter Lock starts with a recognition of a searing reality: that no matter how many multi-core machines exist, most CPU-heavy computation tasks will max out even the cores available on a given large box. Once you come to terms with this fact, you realize what you actually want isn't multi-core computation, but multi-core / multi-node computation. That is, cluster-scale computing.

To illustrate multi-core vs multi-node, we'll contrast Python's standard library concurrent.futures module to the IPython.parallel framework. The former allows you to go multi-core to beat the GIL, with some caveats. But the latter lets you go multi-node.

We'll then explore what makes multi-node computation difficult, and illustrate it with a small Python program that reads a fast-moving data stream and processes it in parallel, using pykafka and Apache Kafka to provide the data stream.

Finally, we'll explore the open source frameworks that have finally "defeated" the cluster computing challenge for Python. These are Apache Storm and Apache Spark. They each have different designs -- and different Python integration options -- but their architectures are fascinating. The good news is, as of 2015, each of these frameworks has a high-quality, production-quality Python API, including one written by the presenter and his team!

You'll leave this talk with the satisfaction that whether you need to use 2 cores, 8, 32, or even 10,000 cores across hundreds of machines, you'll have a technology available and the understanding necessary to make it happen.

Never let being CPU-bound be a bottleneck for your next great data exploration or scientific computing challenge! Attend this talk to beat Python's GIL not with a CPython fork, not with a PyPy STM implementation, but instead with old-fashioned distributed computation!

Sponsors


Become a sponsor.