PyData New York City 2017 - Presentation: Getting Scikit-Learn To Run On Top Of Pandas

Getting Scikit-Learn To Run On Top Of Pandas

Audience level:

Intermediate

Description

Scikit-Learn is built directly over numpy, Python's numerical array library. Pandas adds to numpy metadata and higher-level munging capabilities. This talk describes how to intelligently auto-wrap Scikit-Learn for creating a version that can leverage Pandas's added features.

Abstract

Scikit-Learn is the de-facto standard Python library for general-purpose machine learning. It operates over NumPy, an efficient, but low-level, homogeneic array library. Pandas adds to NumPy metadata, heterogeneity, and higher-leve munging capabilities.

In the field of visualization, newer generation libraries, e.g., Seaborn and Bokeh, are providing safer, more readable, and higher-level functionality, by operating over Pandas data structures. Some of these are implemented using Matplotlib, a lower-level NumPy-based plotting library.

This talk describes a library for a Pandas-based version of sickit-learn. Here, too, giving a Pandas interface to a machine-learning library, provides code which is safer to use, more readable, and allows direct integration with Pandas's higher-level munging capabilities.

Due to the large-scale, and evolving nature, of sicikit-learn's codebase, it is infeasible to manually wrap it. Except for a small number of intentional deviations from sickit-learn, the library wraps Scikit-Learn modules lazily through module and class introspection, and dynamic module loading.

Following a short review of the relevant points of Pandas and Scikit-Learn, the talk is roughly divided into two aspects:

Scikit-Learn And Pandas User Perspective
1. Safety Advantages Of Pandas-Based Estimators
2. Using Metadata For Inter-Instance Aggregated Features And Cross-Validation
3. Using Metadata For Advanced Meta-Algorithms: Stacking, Nested Labeled And Stratified Cross-Valdiation
Python Develop Perspective
1. Unique Challenges Of Scikit-Learn Introspection And Decoration
2. Two Approaches For Wrapping Scikit-Learn Estimators
3. Lazy Dynamic Module Loading

Tuesday 10:45 AM–11:20 AM in Music Box 5411/Winter Garden 5412 (5th fl)

Getting Scikit-Learn To Run On Top Of Pandas

Ami Tavory

Description

Abstract

Subscribe to Receive PyData Updates