How many newspapers should be distributed to each store for sale every day? The data science group at The New York Times addresses this optimization problem using custom time series modeling and analytical solutions, while also incorporating qualitative business concerns. I'll describe our modeling and data engineering approaches, written in Python and hosted on Google Cloud Platform.
The New York Times integrates data science not only into its digital business, but also its print operations. Sending an optimal number of newspapers to each of our sales locations is a long-standing problem that we are newly addressing with a modeling and experimentation platform deployed on Google Cloud. Our models combine custom time series modeling and analytical solutions, while also incorporating qualitative business concerns. In particular, we probabilistically account for censored data (as demand in unknown when the paper sells out) and perform a constrained optimization to maximize profit while minimizing any decrease in circulation. The algorithms are tested using paired treatment and control stores in which we can directly compare profits and sales. This "single copy" modeling must be executed regularly in a robust manner, as it drives our weekly sales in many stores throughout the country; these concerns have informed our design as we scale up our prediction and reporting systems. This is one of the group's longest-running projects, and I will share some surprising lessons we've learned along the way.