Saturday 10:00–10:45 in LG7

AlzHack: Data Driven Diagnosis of Alzheimer's Disease

Frank Kelly, Giles Weaver

Audience level:


Alzheimer's disease is a form of dementia that affects over 44 million people globally. Unfortunately the condition is very hard to detect in its early stages. It is usually diagnosed by a simple questionnaire test, an approach that can only detect Alzheimer's disease many years after its onset. The challenge set in this project was earlier detection using Python and data science.


AlzHack is a collaborative citizen science project undertaken by a small but diverse group of data scientists. We will discuss the challenges encountered in discovering and acquiring suitable data, describe how we cleaned and merged multiple data sources, and how it was possible to extract meaningful features from within.

We will cover textual feature extraction, examining; amongst other methods, part-of-speech tagging, readability calculations, locality sensitive hashing as well as sentiment analysis, all in Python 3.

In addition we will show how a variety of machine learning techniques (including text clustering and classification) were used; with the aim of distinguishing diagnosed Alzheimer's sufferers from their healthy peers solely based on samples of their written correspondence.

This will be followed by a look at changepoint and ramp detection on noisy time series data; deployed to identify subtle changes in signals obtained from correspondence of individuals over time; thus allowing a form of non-medical, 'early warning' style detection of Alzheimer's disease.

Finally we will address the tough task of scaling up a small, collaborative data science project to become an extremely powerful, widely available self-diagnosis tool.