Have you ever wanted to work with maps, but didn't know where to start? This talk will focus on feature engineering for a real-world project: pulling maps from OpenStreetMap, creating road segments, and adding features from other data sources. The features are used by Insight Lane, building tools for traffic departments and advocacy groups to better understand traffic crash risk.
How do you work with map data? Where do you get maps? What counts as a meaningful feature? How do you combine different maps? These are some of the problems we faced when building Insight Lane, an open source volunteer-run civic good project with Data for Democracy that helps cities and advocacy groups better understand traffic crash risk on their roads.
Insight Lane can take any city’s geocoded crash data and using OpenStreetMaps and other city-specific data sources, build a machine learning model to predict risk of crashes on road segments. You can see the output of the model for select cities at insightlane.org.
In this presentation, I will focus on the feature generation step of our pipeline. I will discuss how we process the city maps from OpenStreetMap, how we split roads into segments, and how we include features from city-specific maps and other data sources in our resulting maps. You will see examples using the osmnx, shapely, and pyproj libraries. I will conclude by discussing some interesting features, limitations in the data cities generally make publicly available, and concerns around equity in looking at this sort of data.