In this workshop we will very quickly introduce you to the Apache Spark stack and then get into the meat of performing a full featured geospatial analysis. Using OpenStreetMap data as our base our end goal will be to find the most cultural city in Western Europe!
In this workshop we will very quickly introduce you to the Apache Spark stack and then get into the meat of performing a full featured geospatial analysis. Using OpenStreetMap data as our base, our end goal will be to find the most cultural city in Western Europe!
That's right! We will develop our own Cultural Weight Algorithm (TM) ;) and apply it to a set of major cities in Europe. The data will be analyzed using Apache Spark and in the process we will learn the following phases of Big Data projects:
Here's a summary of the workshop as a sketch.
I hope you will join us on this journey of exploring one of the most exciting technology stacks to come out of the good folks at the UCBerkeley
Spark has quickly overtaken Hadoop as the front runner in big data analysis technologies. There are a number of reasons for this such as its support for developer friendly interactive mode, it's polyglot interface in Scala, Java, Python, and R, and the full stack of Algorithmic libraries that such language ecosystems offer.
Out of the box, Spark includes a powerful set of tools: such as the ability to write SQL queries, perform streaming analytics, run machine learning algorithms, and even tackle graph-parallel computations but what really stands out is its usability.
With it's interactive shells (in both Scala and Python) it makes prototyping big data applications a breeze.
PySpark provides integrated API bindings around Spark and enables full usage of the Python ecosystem within all the nodes of the Spark cluster with the pickle Python serialization and, more importantly, supplies access to the rich ecosystem of Python’s machine learning libraries such as Scikit-Learn or data processing such as Pandas.
During the workshop we are going to use a Docker Container with the relevant libaries. Please try to have the latest docker running on your machine for hands-on work!.