Dask-Gateway provides a secure, multi-tenant server for managing Dask clusters. It allows users to launch and use Dask clusters in a shared, centrally managed environment, and supports a wide variety of backends (e.g. Kubernetes, Hadoop, HPC systems, etc…). In this talk we'll discuss the use and design of Dask-Gateway, as well as some of the issues we encountered while developing this tool.
Dask has become a standard tool for parallelizing computational Python work, scaling from laptops to distributed clusters. Its compatibility with a wide variety of computing environments has been a major strength, allowing users to easily deploy on everything from Kubernetes to traditional HPC systems. New backends can be added by implementing a standard cluster interface, which then plays well with the rest of the Dask ecosystem.
While this design has served us well, there are a few pain points that have come up when using Dask at larger institutions. Some of these issues could be remedied by changes to the existing deployment design, but many of them required something new. We believe that something is Dask-Gateway.
Dask-Gateway is:
Centrally Managed: Administrators do the heavy lifting of configuring the Gateway, users only have to connect to get a new cluster. Eases deployment, and allows enforcing consistent configuration across all users.
Secure by Default: Cluster communication is automatically encrypted with TLS. All operations are authenticated with a configurable protocol, allowing you to use what makes sense for your organization.
Flexible: The gateway is designed to support multiple backends, and runs equally well in the cloud as on-premise. Natively supports Kubernetes, Hadoop/YARN, and HPC Job Queueing systems.
Robust to Failure: The gateway can be restarted or failover without losing existing clusters. Allows for seemless upgrades and restarts without disrupting users.
In this talk we'll discuss the use and design of Dask-Gateway, as well as some of the issues we encountered while developing this tool.