Scikit-Learn is built directly over numpy, Python's numerical array library. Pandas adds to numpy metadata and higher-level munging capabilities. This talk describes how to intelligently auto-wrap Scikit-Learn for creating a version that can leverage Pandas's added features.
Scikit-Learn is the de-facto standard Python library for general-purpose machine learning. It operates over NumPy, an efficient, but low-level, homogeneic array library. Pandas adds to NumPy metadata, heterogeneity, and higher-leve munging capabilities.
In the field of visualization, newer generation libraries, e.g., Seaborn and Bokeh, are providing safer, more readable, and higher-level functionality, by operating over Pandas data structures. Some of these are implemented using Matplotlib, a lower-level NumPy-based plotting library.
This talk describes a library for a Pandas-based version of sickit-learn. Here, too, giving a Pandas interface to a machine-learning library, provides code which is safer to use, more readable, and allows direct integration with Pandas's higher-level munging capabilities.
Due to the large-scale, and evolving nature, of sicikit-learn's codebase, it is infeasible to manually wrap it. Except for a small number of intentional deviations from sickit-learn, the library wraps Scikit-Learn modules lazily through module and class introspection, and dynamic module loading.
Following a short review of the relevant points of Pandas and Scikit-Learn, the talk is roughly divided into two aspects: