In this talk I will present how we use Python, Spark and AWS as our preferred data science stack for the Internet of Things, which allows us to efficiently develop and deploy smart data applications on top of IoT sensor data. We use these technologies to analyse and model IoT time-series data, as well as to build automated and scalable data pipelines for smart IoT data products.
The Internet of Things and Industry 4.0 are here, bringing along a vast amount of connected devices and sensors producing even more data. In order to build smart applications on top of IoT sensor data we need to deal with the challenges that come along time-series data from a large amount of devices.
At WATTx we build data application prototypes in the field of smart homes, smart buildings, and smart climate, which involves making use of data coming from a great deal of IoT sensors measuring -- amongst others -- temperature, humidity, motion, and luminance.
The purpose of this talk is to present how we use Python and Spark to effectively analyse and model IoT data. In particular I will introduce how we use Python to process and model data from multiple IoT sensors, build machine learning models on top of it, and use Spark to scale and deploy our models in automated data pipelines in the cloud as smart IoT applications. I will use the development of predictive models for smart building applications as a real-world example to demonstrate this setup.
We hope that this talk will give valuable insights on how Python and PySpark in conjunction with AWS are powerful tools to work with time-series sensor data from the Internet of Things and build data applications on top of it.