Saturday 1:30 PM–2:15 PM in Production DS, Data - Auditorium

Skein: a simpler way to deploy applications on Hadoop Clusters

Jim Crist

Audience level:
Novice

Description

Apache YARN is the resource manager native to Hadoop. Its JVM-only nature, complicated security model, and myriads of options have historically made it difficult to deploy non-java applications on it. In this talk we present Skein, a tool & library written to ease Python deployment on YARN, as well as some of the issues (and solutions!) we encountered while developing and testing this tool.

Abstract

Apache YARN is the resource manager native to Hadoop clusters. It is responsible for scheduling applications on the cluster (deciding where and when an application gets the resources it requested) and provisioning these resources in a secure and robust way.

Historically, deploying applications on YARN has been complicated for several reasons:

In this talk we present Skein, a tool and library to simplify deployment on YARN. Building on ideas from Docker and Kubernetes, Skein uses a declarative interface for defining and deploying applications, and provides Python access to this previously difficult deployment environment.

Attendees of this talk will learn:

Subscribe to Receive PyData Updates

Subscribe