The US is amid its largest measles outbreak since 1992, with 1,250 cases as of Oct 3, 2019. Most cases (649) were in NYC, where the outbreak was declared over on Sep 3, 2019. This tutorial creates data visualizations to help understand the measles outbreak in NYC. Bubble maps and bar charts are created using Python (bokeh, matplotlib) and following principles of clarity and context.
The tutorial notebook is available both as a Jupyter notebook and as a static HTML page.
1. Introduction
1.1. Motivation
- The large majority of NYC measles cases are in my neighborhood (Williamsburg, Brooklyn).
- Opportunity to learn/practice fundamental as well as advanced data visualization skills.
- Example of a small data project that can help people understand an important issue.
1.2. What is measles?
- Measles is a highly contagious infectious disease that can cause serious health complications.
- Two doses of MMR vaccine provide the best protection against measles.
1.3. A brief history of measles in the US
- Measles was declared eliminated from the US in 2000, thanks to an effective vaccination program.
- The US is amid its largest measles outbreak since 1992, with 1,250 (preliminarily) confirmed cases as of Oct 3, 2019 [CDC]. Most of those cases (649) were in NYC, where the outbreak was declared over on Sep 3, 2019 [NYC Health].
2. Data
2.1. Data Sources
- Of all the affected areas, NYC provides the best data about the 2019 measles outbreak.
- NYC provides raw data about the number of measles cases by date, age, vaccination status, and neighborhood on its NYC Health Measles webpage.
2.2. Data Collection
- The data is collected/updated manually and stored in CSV and XLSX files because the data is relatively small and updated infrequently (only about once a week).
3. Visualizations
All data visualizations are shown in the project homepage: https://carlos-afonso.github.io/measles
3.1. NYC new measles cases by month
- Example of how to create a vertical bar chart to display temporal data.
- Show how to adjust the bar chart properties to provide context and clarity.
- For context: use title and annotations to provide the necessary information.
- For clarity: remove unnecessary chart elements, format month names, show labels with the number of cases.
- Insights: The bar chart clearly shows that, after peaking in Apr 2019, the number of new measles cases declined progressively until it reached 0 in Aug 2019. This is an indication that the additional MMR vaccination efforts that the NYC Health department started taking in April 2019 seem to have helped control the outbreak.
3.2. NYC measles cases by neighborhood
- Example of how to use bokeh to create a bubble map visualization.
- Show and discuss the several design decisions to provide context and clarity.
- Explain how in this case it is better to use a static rather than an interactive map.
- Explain the decision to show labels with the names of the neighborhoods and the respective number of measles. Although the labels “clutter” the map, they are important because they help identify the neighborhoods.
- Insights: The bubble map clearly shows all the NYC neighborhoods with measles cases, using the bubble size to represent the number of cases.
3.3. NYC measles cases by age
- Example of how to create a horizontal bar chart to display categorical data.
- Example of a case when it is better to use a horizontal rather than a vertical bar chart.
- Show how to adjust the bar chart properties to provide context and clarity.
- For context: use title and annotation to provide the necessary information.
- For clarity: remove unnecessary chart elements, show labels with the number and percentages of cases.
- Insights: The bar chart shows that most of the NYC measles cases are in young children.
3.4. NYC measles cases by vaccination status
- Technically this is a horizontal bar chart similar to the one in the previous section (3.3).
- Insights: This bar chart clearly shows that the large majority of the people who got measles were unvaccinated.