Have you ever wanted to run your NumPy based code on multiple cores, or on a distributed system, or on your GPU? Wouldn't it be nice to do this without changing your code? We will discuss how NumPy's array protocols work, and provide a practical guide on how to start using them. We will also discuss how array libraries in Python may evolve over the next few years.
Have you ever wanted to run your NumPy based code on multiple cores, or on a distributed system, or on your GPU? Wouldn't it be nice to do this without changing your code?
For a decade, NumPy has been the library for array computing in Python. In the last couple of years however, there has been an explosion of new libraries providing NumPy-like n-dimensional array data structures: Dask, CuPy, PyTorch, Tensorflow, and more. The __array_function__
(introduced in NumPy 1.16) and __array_ufunc__
protocols attempt to provide a bridge between those different array implementations. We will discuss how these protocols work, and then provide a practical guide on how to try using it on your own code.
These array protocols are one important step in the evolution of NumPy, in response to new libraries and hardware. What other steps will we see in response to other user needs (there are many, from performance to missing data support to custom dtypes)? We will have a look at what may be in store for NumPy, and for array computing in Python more broadly, in the next few years.