As many as 105 million Americans suffer from foodborne illness annually. In 2014, the City of Chicago began forecasting these outbreaks targeting limited health inspection resources toward likely sites, showing a 7 day improvement in locating critical violations at food establishments. This talk provides an end-to-end walkthrough of predicting critical violations in Washington, DC using Python.
In 2014, data scientists at the Department of Innovation and Technology for the city of Chicago built an algorithm to predict likely health code violations for restaurants based on publicly available data in an attempt to reduce foodborne illness. They turned this into a freely available open source project, available on Github in R. However, in spite of the prevalence of foodborne illness and its associated costs (as much as $2–$4 billion annually1), so far only one other location in the country has taken advantage of Chicago's work to implement this model.2 That place is Montgomery County, MD which, with the assistance of Open Data Nation, is successfully adapting the model to the local environment.
This talk provides an end-to-end demonstration of how to replicate the process using Python and open data from Washington, DC. The content is targeted toward the novice data scientist and will discuss the practical aspects of planning and executing the project. Learn how you can combine Python libraries like Requests, BeautifulSoup, Sqlite, Numpy and Sckit-Learn to build your own machine learning model to predict health code violations!
Introduction/Problem statement
Data Science Pipeline
Lessons learned