Saturday 3:00 PM–3:45 PM in Track 1

uarray: Separating interface from implementation

Travis E Oliphant

Audience level:
Experienced

Description

NumPy is an array implementation that is slowly also becoming an interface/API. This leaves users with no way to express their intent; increasing technical debt and making fundamental improvements difficult. Separating interface from implementation is beneficial for the community. We talk about efforts in NumPy itself, why they are insufficient, and how uarray proposes to solve the problem.

Abstract

NumPy started out unifying the array libraries of Numeric and Numarray. NumPy provided a specific implementation of an n-dimensional array, or tensor, as a pointer to strided memory with CPU-based computations. Many libraries have been written with a dependency on NumPy, some of which depend on the implementation detail and most of which only on the interface. As NumPy's popularity grew, other alternate implementations of the NumPy API emerged while also adding automatic differentiation, arrays made of multiple chunks, or using GPUs for computation. Libraries like CuPy, Dask, PyData/Sparse, PyTorch, MXNet, and ChainerX have become popular among users. Downstream libraries and users want a way to be able to make their NumPy-based code use alternative array implementations, and efforts to make NumPy also an interface have emerged.

These efforts have so far focused on making the numpy namespace to to be used as both the interface and the implementation. We maintain that while valiant efforts, the __array_ufunc__ protocol and the __array_function__ protocol are insufficient for the task.

We believe it is critical that the interface be decoupled from the implementation, and thus an array interface should reside in a separate namespace. Such a namespace could then make it easy for end-users and library authors to switch between implementations. We provide and describe working code that accomplishes this separation in the form of a general-purpose multiple-dispatch mechanism called uarray, and a namespace that uses this system to provide the NumPy API to any backend called unumpy.

Subscribe to Receive PyData Updates

Subscribe