Saturday 14:15–15:00 in Kursraum 3

Manifold Learning and Dimensionality Reduction for Data Visualization and Feature Engineering

Stefan Kühn

Audience level:
Intermediate

Description

Dimensionality Reduction methods like PCA - Principal Component Analysis - are widely used in Machine Learning for a variety of tasks. But besides the well-known standard methods there are a lot more tools available, especially in the context of Manifold Learning. We will interactively explore these tools and present applications for Data Visualization and Feature Engineering using scikit-learn.

Abstract

Slides

https://de.slideshare.net/StefanKhn4/talk-at-pydata-berlin-about-manifold-learning-and-applications

Jupyter Notebooks

https://github.com/cc-skuehn/Manifold_Learning

At the end of the slide deck (or in the Readme of the github repo) there are some links to interesting resources, e.g the respective parts of the scikit-learn documentation.

Outline

Dimensionality reduction techniques are not only useful for denoising purposes or making the data better accessible, they are also very important for any meaningful Exploratory Data Analysis or EDA, especially with respect to Data Visualization. Manifold Learning subsumes a collection of advanced methods from the field of Unsupervised Learning that allow us to capture different aspects of the given high-dimensional data in a low-dimensional manifold. Hereby, each method tries to preserve an important quantity - distances between points, variance, statistical or distributional properties. The variety of these different prespectives onto the data is not only helpful for EDA and Dataviz but also offers some new and interesting options when it comes to Feature Engineering and the ultimate task of Machine Learning and AI - "Learning from Data".

Subscribe to Receive PyData Updates

Subscribe