Saturday 16:15–17:00 in LG7

Mining smartphone sensor data with python

Neal Lathia

Audience level:


Data from smartphone sensors can be used to learn from and analyse our daily behaviours. In this talk, I'll discuss processing and learning from sensor data with Python. I'll focus on accelerometers - a triaxial sensor that measures motion - starting with an overview pre-processing the data and ending with supervised and unsupervised learning applications and visualisations.


Our smartphones are increasingly being built with sensors, that can measure everything from where we are (GPS, Wi-Fi) to how we move (accelerometers) and other aspects of our environments (e.g., temperature, humidity). Many apps are now being designed to collect and leverage this data, in order to provide interesting context-aware services and quantify our daily routines.

In this talk, I'll give an overview of collecting sensor data from an Android app and processing the data with Python. I'll focus on accelerometers - a triaxial sensor that measures the device's motion - which is now being used in apps that detect what you are doing (cycling, running, riding a train); if we have enough time I'll also briefly cover a similar example with Wi-Fi/location data. Using an open-sourced Android app and iPython notebook, I'll discuss the following questions:

  • What does the raw data look like? There are a number of trade-offs when collecting sensor data: most notably, data collection needs to be balanced against battery consumption. Plotting the raw data gives a view of how the data was sampled and how it changes across activities.
  • How can I pre-process and extract features from this data? Three kinds of features can be extracted from acceleromter data: statistical, time-series, and signal-based. Most of these are readily available in well-known Python libraries (scipy, numpy, statsmodels).
  • How can these features be used to analyse behaviours? I'll show an example of using accelerometer data to cluster users into groups, based on how active they are.
  • How can these features be used to detect behaviours? I'll show an example of training a supervised learning algorithm (using scikit-learn) to detect walking vs. running vs. standing.

I'll close by discussing how these techniques are being applied in novel smartphone apps for health monitoring.