PyData New York City 2019 - Presentation: Machine learning from scratch using the scientific Python stack

Level-up your data science by diving deep into the innards of scientific Python

Building your first few scikit-learn models is gratifying, but where do you go from there? Gaining a deeper understanding of the numerical methods underlying your favorite modeling library is important for advancing in your data science career as it allows you to make more informed decisions about efficiency and run-time. Dive deep into the innards of the scientific Python stack (SciPy and NumPy) in a way relevant for data science, statistics and related numerical fields.

Tutorial structure: review math, write two algorithms from scratch

This tutorial will be have two parts: we'll spend 45 minutes reviewing numerical solution methods by hand, then dedicate 45 minutes to re-writing a popular machine learning algorithm from scratch using only NumPy and SciPy. In particular, we'll explore matrix decompositions for feature extraction and NLP, including topic modeling, plus gradient descent and the Fast Fourier Transform (FFT). We'll end by using NumPy and SciPy to code up PCA/LSA and gradient descent by hand! This should give you the confidence to dive deeper into the code base for Python machine learning libraries like SKLearn and give you the knowledge to start contributing to the development of machine learning open source Python software.

After this tutorial, you will:

Understand the data structures at the heart of the Python scientific stack (NumPy and SciPy), including sparse matrices
Gain a thorough review of the numerical solution methods of 1) matrix decomposition, 2) stochastic gradient descent, and 3) the Fast Fourier Transform (FFT)
Know how matrix decomposition and the FFT are applied in data science and related fields
Write your own PCA (principal components analysis) and stochastic gradient descent algorithms from scratch in Python, using only SciPy and NumPy
Deepen your appreciation for the math and numerical solution methods underlying many of the most common and popular machine learning models
Be better prepared to dive into the SKLearn code base in order to begin contributing to open source machine learning software in Python

Wednesday 1:15 PM–2:45 PM in Winter Garden (5412)

Machine learning from scratch using the scientific Python stack

Lara Kattan

Description

Abstract

Level-up your data science by diving deep into the innards of scientific Python

Tutorial structure: review math, write two algorithms from scratch

After this tutorial, you will:

Subscribe to Receive PyData Updates