We will present a practical comparison of several ways to write fast numerical code, focusing on features and limitations of each approach. We will compare numba, cython, pypy, c++ bindings, julia bindings, and more, giving practical advice on when each of them may be the best option.
When working with python, one often encounters situations in which the interpreter overhead becomes a major bottleneck, especially in numerical computing.
When no built-in numpy ufunc can come to the rescue, and when vectorization seems impossible or at leas highly inconvenient -- what can we do?
There are several ways to speed up "hot loops" in python -- there's numba JIT compiler,
there's cython which will generate C code for us, there are multiple ways to create bindings to C or C++.
And they all can give you fast-running code.
But what are the tradeoffs between them? When should you choose one over another? What limitations and headaches come with each approach?
We will try to answer these questions, helping you to more easily decide between them in practice.