PyData New York City 2018 - Presentation: Deploying Data Science for Distribution of The New York Times

Deploying Data Science for Distribution of The New York Times

Audience level:

Intermediate

Description

How many newspapers should be distributed to each store for sale every day? The data science group at The New York Times addresses this optimization problem using custom time series modeling and analytical solutions, while also incorporating qualitative business concerns. I'll describe our modeling and data engineering approaches, written in Python and hosted on Google Cloud Platform.

Abstract

The New York Times integrates data science not only into its digital business, but also its print operations. Sending an optimal number of newspapers to each of our sales locations is a long-standing problem that we are newly addressing with a modeling and experimentation platform deployed on Google Cloud. Our models combine custom time series modeling and analytical solutions, while also incorporating qualitative business concerns. In particular, we probabilistically account for censored data (as demand in unknown when the paper sells out) and perform a constrained optimization to maximize profit while minimizing any decrease in circulation. The algorithms are tested using paired treatment and control stores in which we can directly compare profits and sales. This "single copy" modeling must be executed regularly in a robust manner, as it drives our weekly sales in many stores throughout the country; these concerns have informed our design as we scale up our prediction and reporting systems. This is one of the group's longest-running projects, and I will share some surprising lessons we've learned along the way.

Wednesday 4:20 PM–5:00 PM in Radio City (#6604)

Deploying Data Science for Distribution of The New York Times

Anne Bauer

Description

Abstract

Subscribe to Receive PyData Updates