Saturday 11:15 AM–12:00 PM in C11

The Magic of Numpy

Sarah Masud

Audience level:
Intermediate

Description

It will not be wrong to say that Numpy is the biggest reason for the success of Machine Learning in Python. How did Numpy achieve this feat? How is Numpy is able to handle both the scale and dimension of data with ease? While there are many factors that have gone into the design of this library, this talk will focus on 3 design decisions, that makes Numpy the magical, powerful library we know of.

Abstract

Summary:

What is common between Pandas, Scipy, Sklearn, Matplotlib, Keras? Apart from the fact that they are famous Python libraries? Well, all of these along with 1.56K other packages [1], have Numpy as a dependency. This is a huge feat! It will not be wrong to say that Numpy is the biggest reason for the success of Machine Learning in Python. But how did Numpy achieve this position? How is Numpy is able to handle both the scale and dimension of data with ease? While there are many factors that have gone into the design of this library, this talk will focus on 3 design decisions, that makes Numpy the magical, powerful library we know of.

Intro:

What separates an N-d Numpy array from a recursive list-list data structure? What makes Numpy more data and speed efficient than standard Python Data Structures? How is it that a typical reshape operation that should require O(n^2)(brute-force) space/time complexity can be done with ease in Numpy? That too dynamically! How is Numpy able to scale its operations for the matrix of larger sizes? How has Numpy been able to optimize for vector-scalar mathematical operations? Numpy is able to load, store, manipulate numerical data with ease because it is built to optimize for numerical data and mathematical operations in Python. A lot of design decisions have gone into this [2]. However, in this talk, we will talk in detail about 3 most basic, yet equally powerful design decisions.

  1. Homogeneity and data types.
  2. Broadcasting and universal functions.[3]
  3. Memory Modeling and data views.

Each of the 3 points will be explored in details in theory and accompanied by examples from Numpy. Wherever possible time and space optimizations achieved by Numpy against standard Data Structures will also be mentioned.

Key takeaways:

  1. To help navigate answers to the questions raised in the Talk Intro.
  2. Develop an intuitive understanding of how Numpy is able to achieve some of its operations at scale.
  3. Nudge the audience to explore how effective design decisions contribute to the long-term success of a library/project.

P.S: This is not an Intro to Numpy tutorials.

Refs:

  1. [1]: Libraries.IO
  2. [2]: Scipy Docs
  3. [3]: Python Data Science Handbook

Subscribe to Receive PyData Updates