Do you like Nate Silver's FiveThirtyEight datablog? Ever dreamed of becoming a psephologist? Come along to this political data hackathon and try your hand at forecasting a general election! There's a myriad of interesting datasets to look at, some extremely complex geospatial and time series problems to solve, and a world of "open" data that isn't nearly open enough. Run by @SixFiftyData.
When Theresa May announced plans in April 2017 for the UK to hold a general election, the public may have despaired but we formed SixFifty - a collaboration of data scientists, software engineers, data journalists and political operatives. We wanted to understand why forecasting elections in the UK using open data is notoriously difficult, and to see how far good statistical practice and modern machine learning methods could take us. We also wanted to make political and demographic data more open and accessible by showcasing and releasing the cleaned versions of the datasets we're using.
Now we invite you to join us in digging into all the datasets we came across, data from previous elections, polling data, data from multiple censi, and see if you can build an even better election prediction model. Or perhaps you'd like to tackle some of the hardest challenges we faced, such as turning PDF polling tables into usable information (for an example, see https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/d8zsb99eyd/TimesResults_FINAL%20CALL_GB_June2017_W.pdf).
Prize categories
- Machine learning competition. There will be a Kaggle-inspired machine learning competition for predicting the outcome of previous UK general elections. SixFifty has been working hard to source and produce model-ready datasets for solving this problem. All that remains is for someone to solve it! Specifically, we're looking for the best predictive model for both the 2010 and the 2015 General Elections¹.
- Voter engagement. For the hack most likely to get more people to turnout.
- Use of open data. Any project with a goal of making data more open (e.g. improved accessibility, or improved machine-readability).
- Fanciest AI hack.
- Most entertaining demo.
- Audience favourite.
- Best in show.
Datasets
You don’t have to use these, but they’re a good start.
- GE2017 Tech Initiatives Handbook: http://bit.ly/GE2017TechHandbook – Collection of resources, datasets, volunteers, existing projects, proposed projects. Initiated by Newspeak House.
- UK Politics Datasets: http://bit.ly/UKPoliticsDatasets – Crowdsourced document of links to useful datasets & munging tools. Candidates, polling stations, constituencies, parliament voting records, parliament speeches, Hansard, previous election/referendum results, registered financial interests, boundary maps, shapefiles, campaign expenses, registration rates, candidates CVs, constituency stats, GE2017 manifestos…
- Democracy Club: https://democracyclub.org.uk/data/ – Election identifiers, candidates’ info since 2010 (name, email, photos, social media), polling stations, all CC-BY-SA.
- mySociety: mySociety have created a range of tools including Parliamentary Monitoring, structured data on every national politician in the world, candidate data, contacting elected representatives, constituency/postcode matching tool MapIt, and published transcripts from all levels of government.
- mySociety Geographic data: When it comes to building predictive models, geocoded data is quite handy. Official “open” data portals can be broken, or the data only available via mail-order CD, so mySociety’s cache of OS, ONS, and OSNI open geographic data going back to 2010 is a gold mine.
- SixFifty Datasets: https://github.com/six50/pipeline – Model-ready datasets for 2010/2015 elections, EU referendum, opinion polls at national/regional levels, all available in CSV, JSON and Feather.
Intellectual Property
All Hackathon submissions remain the intellectual property of the individuals or organisations that developed them. We encourage participants to open source their projects to both share their hacks with the greater community and promote innovation in this space.
¹ ML competition conditions: Models must use prior data only (the results are a matter of historical record!). Code must be made openly available for review via GitHub. With the winner’s permission, SixFifty will work with competition winners to publish the winning models with commentary on SixFifty.org.uk.