Having extensive data mining experience but limited Python knowledge I fired up a notebook and started digging into COVID case data from EU CDC. A couple of months later, after visualization, curve fitting, modeling and mapping I developed a coronaradar showing and projecting the spread of infections across the world. Let's have a look together at what I did, no slides, just live code.
How do you get from raw reported COVID case data to an animated map of projected disease spread? It takes visualization, curve fitting, modeling, mapping and Python to tie it all together. Using open source libraries such as NumPy, Pandas, PyEarth and Folium and a bit of high school mathematics you will see how to build this yourself. Logarithms and regression splines, anyone?
The EU CDC publishes a daily update of reported cases per country. Even if it is handily procured, there are always pitfalls ingesting data, never assume! You'll always run into data quality and process issues before you can start analyzing properly, so the first step is creating an environment where you can start answering questions. Jupyter fits the bill nicely, and Pandas and a bit of NumPy can get you going quickly.
Next is making sense of what you're looking at. It's pretty easy to get a plot, but what exactly does it show? Your view is always distorted by the process that delivers the data, so is that spike for real or caused by reporting? How can there be a negative number of cases? And even if you solved those problems, what does a disease outbreak look like? Here we dive a little into modeling and curve fitting, to find out what is really going on in the real world and how that is reflected in the data. We need a bit of mathematics, statistical distributions and regression here and resort to PyEarth for an advanced analysis.
Once we understand what is going on, we can try to extrapolate by applying our findings to future dates. Prediction is tricky, and we will see if and how we can do that. Perhaps even more important, how do you present and communicate your predictions? Mapping is always a striking example, and using Folium we will see what is needed to create an interactive, animated map of what we found and how we think it will develop: a coronaradar.
This session will be presented live from my Jupyter notebooks and interaction is welcome although perhaps limited by time. Everything presented is freely available at gitlab.com/dzwietering/corona.