Friday 10:45–12:15 in GoDataDriven

Structured Streaming with PySpark and Azure Databricks

Andrei Varanovich

Audience level:


IoT adoption is raising, together with the number of other types of scenarios, where high-performance streaming analytics becomes a critical component. In this tutorial we focus on the capabilities of structured streaming in Apache Spark, for building so-called continuous applications in the cloud. Microsoft Azure is used as the running platform.


In this tutorial we leverage the following components: Azure Event Hubs -- a highly scalable publish-subscribe service - and consume data from Azure Databricks - a scalable PaaS Spark offering on Azure, to build an end-to-end streaming analytics pipeline with PySpark.

Tutorial outline:

No prior knowledge of Azure is required, all necessary components will be made available to the participants.

Subscribe to Receive PyData Updates