Data for Democracy is a civic engagement hub, for volunteer data scientists to carry out work with social impact. This talk will focus on the work of one team, who have built a web scraper and natural language processing pipeline to track and analyse online reports of people displaced by conflict and disaster. It will also reflect on the challenges faced using data in the humanitarian sector.
Data for Democracy is a community of civic minded volunteer technologists, programmers and data scientists working on everything from understanding propaganda, to reducing urban traffic fatalities, to making election data available to the public. One of the main projects is focused on tracking online reports of internally displaced persons (IDPs) - people forced to flee from conflict or disaster but who remain within their original country of residence. The team has been been working in response to a call by the International Displacement Monitoring Centre, for solutions to the problem of collecting and analysing data from different news sources about situations involving IDPs.
The group has developed a Python back-end that scrapes web pages, extracts content, tags and filters articles by topic, and retrieves key information such as the number of people displaced. The solution makes full use of the spectrum of packages available in the Python toolkit, including newspaper
to parse online articles, gensim
for powerful, efficient topic modelling, scikit-learn
for article classification, and sqlalchemy
for database handling. This talk will provide an overview to the technical approach used by the multidisciplinary and international team to tame the messy unstructured data and provide a prototype product that can be used by humanitarian analysts to monitor displacement crisis information.
The presentation will also highlight the challenges and successes that come with working in a group of volunteers spread across multiple timezones, disciplines, and experience levels, to create a data product for a sector that has traditionally been slow to make use of technology. Through the story of this project, the motivations and wider efforts of the volunteer-led Data for Democracy community will also be highlighted, showing the power that data practitioners have to make a difference.