Hyperparameter optimization on Spark is commonly memory-bound, where the model training is done on data that doesn’t fit on a single machine. We introduce Fugue-tune, an intuitive interface focusing on compute-bound hyperparameter tuning that scales Hyperopt and Optuna by allowing them to leverage Spark and Dask without code change.
Hyperparameter tuning is used in model development to search for optimal model parameters. Spark hyperparameter tuning has generally been done on memory-bound problems, where one dataset is split across different machines, and multiple models are trained in a sequential way. In this talk, we’ll explore how to use Apache Spark as an engine for parallelizing compute-bound tuning problems, where hundreds or thousands of smaller models are trained in parallel.
There are multiple approaches to hyperparameter tuning. Grid search is exploring a finite combination of values, while Bayesian Optimization is building over the last attempts to create a better hyperparameter combination. Approaches like grid search are trivially parallelizable, while Bayesian Optimization has a sequential dependency. But actually, we can combine these two ideas to parallelize a Grid of Bayesian Optimization trials over Spark. This will be done through Fugue-tune, a general interface that abstracts existing machine learning frameworks such as Optuna and Hyperopt, by providing a scalable interface on top of them.
In this talk, we'll explore how to tune a general ML objective on a hybrid search space at where model search, grid search, random search and Bayesian optimization are combined intuitively using Fugue-Tune's simple interface. Using Greykite as an example, we will demo tuning a forecasting model distributedly and monitoring the best result at realtime.