Thursday 2:50 PM–3:35 PM in Track 1 - McKinley

Python and IoT: From Chips and Bits to Data Science

Jeff Fischer

Audience level:


This talk will take you through the design of a smart lighting system, including sensor hardware and software (based around MicroPython), data analysis (using NumPy, Pandas, and Jupyter), and lighting control (using Hidden Markov Models via Hmmlearn). From the talk, you should get a sense of how the hardware, software, and math fit together to create a solution.


Ever want to know what is behind the "Internet of Things" hype? I wanted to as well, so I embarked on a side project to learn more. This talk is the story of my journey, using, of course, my favorite programming language, Python.

In this talk, I will take you through my project, a lighting replay system. The application monitors the light levels in several rooms of a residence and then replays a similar pattern when the house is unoccupied. The goal is to make the house look occupied, with a lighting pattern that is different every day, but looks realistic. It accounts for the different patterns found in each individual room as well as seasonal factors (e.g. changing sunrise/sunset times). The full source code for the application is available on github here.


A basic knowledge of Python is assumed, and exposure to the PyData ecosystem helpful, but no special knowledge about hardware or data science is needed to follow this talk.


The application was built in three phases: 1) data capture, 2) data analysis, and 3) the lighting player.

Data Capture

Light sensor data is gathered via ESP8266-based microcontrollers. These chips have only 96 kilobytes of data RAM, but have built-in WiFi and can run MicroPython, a lightweight implementation of Python 3 that works without an operating system. Data is sent to a Raspberry Pi via the MQTT messaging protocol. There, it is saved into CSV files for offline processing.

Data Analysis

The light sensor data is next analyzed. The CSV data files are parsed, post-processed, and read into Pandas Series data structures. The raw and processed data can be visualized in a Jupyter notebook.

The light readings are then grouped into four levels via K-Means clustering. These four levels are mapped to on-off values, depending on the particulars of each room (e.g. how much ambient light is present). We divide each day into four "zones", based on the absolute time of day and sunrise/sunset times. The samples are grouped into subsequences separated by zone and by gaps in the data readings.

These subsequences are then used to train Hidden Markov Models using Hmmlearn. Hmmlearn can infer a state machine that will emit a similar pattern of on/off samples. A total of four models are created per room, with one for each zone.

Lighting Player

The light controller application runs off the Hidden Markov Models created in the analysis phase. It controls Philips Hue smart lights, which are accessible via a REST api. We use the phue library to abstract the details of the control protocol. Since both the light sensors and the lights are wireless, this application can be easily deployed without replacing light switches or stringing wires through the house.

Subscribe to Receive PyData Updates


Get Now