Predicting user actions is a challenging and important part of any online advertising business. The rarity of some user actions introduces problems of class skew, overfitting, and data censoring to modeling efforts. This presentation will delve into the business problem and walk through an example of fitting a predictive model to online advertising data provided by Maxpoint.
How much does exposure to online advertising influence a consumer to purchase a product? Predicting the behaviors of potential consumers is a challenging task in the online advertising business. These valuable user actions can be quite rare, and there may be a dearth of information available for modeling each user. Having an accurate model for prediction of user behaviors allows a business to minimize the cost of serving ads to users who would not be influenced by them while maximizing the occurrence of user actions for which the business gets paid.
In practice there may be a number of ways to model these behaviors and ensembling the results together may result in the best prediction. One useful model attempts to predict whether users who have previously visited a web site will return and make a purchase, if they are subsequently served an ad. This so called retargeting model can be modeled as a supervised classification problem with both time varying and time invariant features.
This presentation will illustrate some of the concepts behind creating a retargeting model, using data provided by Maxpoint, from model conceptualization to deployment. Python and Jupyter notebooks provide an environment where prototypes can be rapidly implemented, evaluated, and iterated on. Powerful community libraries offer functionality to deal with problems like class skew, overfitting, and data censoring. The strong performance of the Python language and availability of a multiprocessing solution offer avenues for efficiently transitioning a prototype into a job that can be run at scale with minimal modification. Concepts and code will be presented in detail.