Modeling user preferences over a feature space and efficiently identifying the highest-valued regions is a common problem in machine learning. This talk discusses a solution to this problem that leverages Bayesian optimization. In doing so, we offer a way to model users’ preferences when data is expensive to obtain and comes in the form of pairwise comparisons.
Abstract
Optimizing a user’s valuation of a product is a common problem in machine learning (ML), and an attractive solution to this kind of blackbox optimization problem is Bayesian optimization (BO). BO offers a principled way to quickly identify a user’s highest valuation by designing queries that balance the trade-off between exploration and exploitation. However, a limitation of existing solutions in this framework is that they only work with data on real-valued ratings (e.g., a score of 4 on a 5-point scale asking users to rate a product), and yet in many situations we only have access to information about the user’s preference for a product in one setting over another setting, i.e., pairwise comparison data.
To learn users’ valuation using these pairwise comparisions, we can leverage a technique called preference learning (PL). This talk presents how BO may be extended to PL, including using a Gaussian Process (GP) for modeling and designing optimization strategies. We will learn how to build an optimization loop of this kind and gain practical skills to do so , such as how to decide which strategy is effective in low- vs. high-dimensional spaces. This talk will benefit ML practitioners who work on human-in-the-loop, A/B testing, product recommendation, and user preference optimization problems. While the talk will cover most background knowledge necessary to implement this optimization technique, the audience should be familiar with common concepts in ML such as training data, predictive models, and multivariate normal distributions.
Description
The following is the tentative plan for the talk.
We set up the main problem that this talk addresses: optimizing a user’s preference, which is treated as an expensive-to-query, blackbox function. Real-world applications include machine learning hyperparameter tuning, optimizing a physical process, and finding the product that a user is most likely to be interested in. This talk offers a way to model users’ preferences using observations that come in the form of pairwise comparisons and are expensive to obtain in large amount. As such, it complements and extends existing approach that are designed to use real-valued and inexpensive-to-query observations.
The workflow consists of multiple iterations where we repeatedly query and observe the function’s value at a location, train a model on observed data, and use that model to inform our next query. The goal is to identify the global optimum of the function as quickly as possible. A toy example we will be using throughout the talk is finding the best amount of sugar to use when making coffee.
We will discuss the background for BO as a framework of optimizing a blackbox function. This will be quite brief as it has been the topic of many talks in the past and isn't the focus of this talk.
A GP defines a distribution over functions such that the function values at any given array of locations follow a multivariate Normal distribution, which is still the case even when a GP is conditioned on some observed data. GPs are almost exclusively used as the predictive model in BO for their ability to quantify uncertainty about the function value in regions where we don't have a lot of observations.
We will include plots showing how our belief about a function represented by a GP changes as new datapoints are observed, emphasizing on the level of uncertainty in that belief.
What is the most informative location to query the function value at, so that we could identify the global optimum quickly? A BO policy answers this question by taking into account our belief and uncertainty about the function (represented by the GP). We focus on the expected improvement (EI) policy, arguably the most popular in practice. This policy computes the expected value of the increase in the current best-seen point for each potential query location, which is straightforward to do given the multivariate normal belief about the function values.
This policy naturally balances exploration and exploitation in its scoring, favoring locations with either high function values or high uncertainty. We will include plots to demonstrate this point. Making a batch of queries using this EI score is also possible.
We now discuss our PL framework, where observations come in the form of pairwise comparisons, which is the main focus of this talk.
The GP conditioned on pairwise computation observations cannot be derived in closed form, but may be approximated using Laplace approximation. We won't go into detail about how this algorithm works, but will instead look at plots showing the output of this algorithm for different datasets.
These plots will show that the learned GP successfully reflects the information contained in the pairwise comparisons, while still offering reasonable measures of uncertainty.
Our task is to choose a pair of locations whose output comparison from the user will help us identify the global optimum the fastest. We will consider two extensions of EI to this setting: (1) choose a batch of two queries using the EI score and (2) compare the single EI candidate against the current best observation. We will see the difference in querying behavior of these two policies in different scenarios.
We will run these two policies on a suite of benchmark functions and examine their performance. We will observe that policy (1) performs better on one-dimensional functions, while policy (2) is better on two- and three-dimensional ones. This insight will help practitioners choose the appropriate policy in their own use cases.
We will end the talk with some high-level discussions.
We will first talk about two Python packages that implement this preference-based optimization loop: GPro and BoTorch. The former is built on top of Scikit-learn and more lightweight, while the latter belongs to the PyTorch + GPyTorch + BoTorch ecosystem and comes with extensive support.
Overall, this discussion will help practitioners navigate the space of available software.
Not all preference optimization applications are appropriate for this workflow. We will end this talk by outlining when this specialized optimization routine might not offer the best solution: such as when real-valued evaluations/ratings are more natural; when evaluation is cheap; or when there are only a small, discrete set of choices.
"What questions do you have?"