Thursday 2:50 PM–3:30 PM in Room 1

Introduction to Zeppelin Notebooks and PySpark 2.0

Kevin Prybol

Audience level:
Novice

Description

Apache Zeppelin is interactive data analytics environment for distributed data processing system. This talk will give a brief overview of what Zeppelin is and where Zeppelin fits into the larger data science/big data ecosystem, discuss how it differs from Jupyter and cover several of Zeppelin's key features via a live demo use the integrated (and just released) PySpark 2.0 interpreter .

Abstract

Apache Zeppelin is interactive, multi-purpose, data analytics environment for distributed data processing system. It provides beautiful interactive web-based interface, data visualization, collaborative work environment and many other nice features to make your data analytics more fun and enjoyable. This talk will provide a brief overview (via live demo) of some of Zeppelin's key features such as it's pluggable architecture for backend integration, drag and drop visualizations, dynamic forms, notebook persistence, Shiro and notebook authorization, and it's ability to share variables BETWEEN contexts )E.g. the results of a Flink paragraph can be passed to a Spark paragraph; the best tool can be used for the job can be used at each step in analytics pipeline and a data scientist who loves Scala Flink can easily work with a data scientist who loves pyspark.)

Live demo will utilize Zeppelin Notebook's built in PySpark interpreter (will use the just released Spark 2.0 API). This talk will also explore where Zeppelin fits into the larger data science/big data ecosystem, discuss similarities and differences with Jupyter Notebooks,and why Zeppelin has a bright future despite being a late entrant into an already crowded notebook/interactive analytics space.