Conference Schedule

General Sessions

  Talks I Talks II Workshop/Tutorial I Workshop/Tutorial II Community Events Sprints
Thursday October 28 6:30 AM Reasoning with Natural Language Processing: advancement in the interpretation of Arabic speech Aseel Addawood
Thursday October 28 7:00 AM Anyone GAN do this: Solving the Minority Class Imbalance problem once and for all Dipam Paul, Alankrita Tewari, Dipam Paul
Thursday October 28 7:30 AM Redefining Insurance with Predictive and Preventive Artificial Intelligence. Ashraf Ibrahim Compressive Sensing Iman Mossavat
Thursday October 28 8:00 AM Best Practices in Machine Learning Observability Ankit Rathi, Yatin Bhatia Visualizations for Privacy Preservation: The Balancing Act between Utility and Uncertainty Gatha Data Analysis with Pandas and NumPy Nimrita Koul
Thursday October 28 8:30 AM 📚 Notebook To Production 👷🏼 Nir Barazida Wounds Over Time - Tracking Wound Healing via 3D Models Sivan Biham
Thursday October 28 9:00 AM Collaborative editing in Jupyter Notebook Kevin Jahns Is my data drifting? Early monitoring for machine learning models in production. Emeli Dral Knowledge graph data modelling with TerminusDB Cheuk Ting Ho
Thursday October 28 9:30 AM Start Asking Your Data “Why?” - A Gentle Introduction To Causal Inference Eyal Kazin Components, Workflows, and Cookbooks - Building Medical Grade AI pipelines with Argo Workflows Omri Fima Graphs for Data Science with NetworkX (pre-recorded) Bruno Gonçalves
Thursday October 28 10:00 AM An analysis of Societal Bias in SOTA NLP Transfer Learning Benjamin Ajayi-Obe, David Hopes Keeping sensitive data safe using recommendation systems Liron Faybish (Ben-Kimon)
Thursday October 28 10:30 AM Classifying Documents on a Graph using GNNs Avi Aminov Interpretable ML models at scale Aishwarya Agrawal
Thursday October 28 11:00 AM Get to know Apache Kafka with Jupyter Notebooks Francesco Tisiot

Break

🦉DVC Showcase – Who Moved My Data? 🗂 Dean Pleban Analyzing gender based violence data with Python Ivana Feldfeber, Lucy Jiménez, VELEZ RUEDA ANA JULIA

Supply Chain Bot Tournament

Thursday October 28 11:30 AM

Break

Introducing Blosc2, the next generation of the Blosc compressor Francesc Alted
Thursday October 28 12:00 PM Risk at Scale - Running a large investment risk system and how risk analysis techniques can help you Barry Fitzgerald Python and Flutter application for Colouring and Enhancing Old Photos Utkarsh Mishra

Social Hour

Thursday October 28 12:30 PM Why *Interactive* Data Visualization Matters for Data Science in Python Nicolas Kruchten Highly-Scalable NLP to Answer Questions on South Africa’s COVID-19 WhatsApp Hotline Adam Chang Document your scientific project with Markdown, Sphinx, and Read the Docs Juan Luis Cano Rodríguez

Break

Thursday October 28 1:00 PM Image classification in retail: Lessons from the real world Valentina Bono, Paul Klinger Deep Neural Deduplication Marcin Mosiolek
Thursday October 28 1:30 PM Dask: From POC to Production April Rathe Accelerating ML Inference at Scale with ONNX, Triton and Seldon Alejandro Saucedo
Thursday October 28 2:00 PM AIQC; deep learning experiment tracking with multi-dimensional pre/post-processing. Layne Sadler Computations as Assets - a New Approach to Reproducibility and Transparency Anders Berkeman, Carl Drougge, Sofia Hörberg Python Dashboarding Shootout and Showdown James A. Bednar, Adrien Treuille, Nicolas Kruchten, Philipp Rudiger, Sylvain Corlay
Thursday October 28 2:30 PM Polars, the fastest DataFrame library you never heard of. Ritchie Vink TileDB and the New Data Economics Stavros Papadopoulos Know Your Data First: An Introduction to Exploratory Data Analysis Sin-seok SEO
Thursday October 28 3:00 PM Darts for Time Series Forecasting Julien Herzen, Francesco Lässig Functional, Composable, Asynchronous, Type-Safe Python Sune Debel

Executives at PyData

Thursday October 28 3:30 PM Data Science in the Enterprise: A Holistic Approach Gaby Lio Enterprise Machine Learning Pipelines with Unstructured Image Data Jacqueline Nolis, Chase Ginther
Thursday October 28 4:00 PM

Keynote - David Beazley

Thursday October 28 5:00 PM Profiling and Tuning PyTorch Models Shagun Sodhani Think Like Git Eli Sander GPU development with Python 101 Jacob Tomlinson Map Visualizations with Dash Leaflet Haw-minn Lu
Thursday October 28 5:30 PM Time: The most misunderstood dimension in data modelling Sergii Mikhtoniuk 5 Reasons Parquet files are better than CSV for data analyses Matthew Powers
Thursday October 28 6:00 PM Predictive modeling in a video advertising marketplace Olga Bane Effective Testing for Machine Learning Projects Eduardo Blancas

Social Hour

Thursday October 28 6:30 PM Innovating in the Oil & Gas Industry with AI/ML Hoda Rezaei conda-forge in 2021 Eric Dill Assessing and Mitigating Unfairness in AI Systems Manojit Nandi
Thursday October 28 7:00 PM Spatial Analytics using Dask & Numba Brendan Collins Data infrastructure at the COVID Tracking Project Julia Kodysh

Pub Quiz

Thursday October 28 7:30 PM Bodo: Supercomputing-Like Performance and Scale for Python/Pandas Ehsan Totoni Scalable Sustainability with the Planetary Computer Tom Augspurger
Thursday October 28 8:00 PM

Lightning Talks Logan Kilpatrick, Braden Riggs, Timo Metzger, Jacob Zelko, John Cox, Nik Agarwal, Raghuram Thiagarajan, Neel Surya, Timothy Odom, Xu Chen, Rongpeng Li (Ron), Miki Tebeka

Thursday October 28 8:30 PM Building Responsible Data Science Workflows: Transparency, Reproducibility, and Ethics by Design Valentin Danchev, Ben Marwick, Dr. Brandeis Marshall (she/her), Kirstie Whitaker, Sara Stoudt, Thibault Lestang, Yacine Jernite
Thursday October 28 9:00 PM Deploying a Mobile App on Tensorflow: Lessons Learned Reshama Shaikh, Nidhin Pattaniyil Snowflake and Tecton: How to build production-ready machine learning pipelines Miles Adkins, Kevin Stumpf
Thursday October 28 9:30 PM Faceoff Fun with Python Frameworks: FastAPI vs Flask 2.0 Tonya Sims Unifying Large Scale Data Preprocessing and Machine Learning Pipelines with Ray Datasets Alex Wu, Clark Zinzow
Thursday October 28 10:00 PM Graph Thinking Paco Nathan Lux: Automatic Visualizations for Exploratory Data Science Doris Jung-Lin Lee

Matplotlib Sprint

Thursday October 28 10:30 PM Submodular optimization for minimizing redundancy in massive data sets Jacob Schreiber Makefiles: One great trick for making your conda environments more managable. Kjell Wooding
Thursday October 28 11:00 PM
Friday October 29 2:00 AM

General Sessions

  Talks I Talks II Workshop/Tutorial I Workshop/Tutorial II Community Events Sprints
Friday October 29 6:00 AM

Lightning Talks Dana Averbuch, Jeremy John Selva, Pranav Kompally, hassaku, Meshva Patel, Ilana Tuil, Shivay Lamba, Evgeny Karev, Nicolo Musmeci

Friday October 29 7:30 AM FugueSQL - The Enhanced SQL Interface for Pandas, Spark, and Dask DataFrames Chengxuan Wang NLP and Hate speech: Why does it matter and what can we do? Smriti Singh Machine Learning Lifecycle Made Easy with MLflow Karishma Babbar, Kalyan Munjuluri
Friday October 29 8:00 AM Neural Prophet – A powerful AI framework for Time Series Models Kalyan Prasad Dask for Everyone Hugo Bowne-Anderson
Friday October 29 8:30 AM Designing Functional Data Pipelines for Reproducibility and Maintainability Chin Hwee Ong ML in Production – Serverless and Painless Oliver Gindele
Friday October 29 9:00 AM Machine learning in health: Predicting pregnancy complications Oliver Rieger Counter Factual Analysis for Explainable AI Shashank Shekhar

Break

Friday October 29 9:30 AM Introduction to Quantum Computing with Python and Qiskit Vicente Ruben Del Pino Ruiz Lessons learned from deploying Machine Learning in an old-fashioned heavy industry Robert Meyer Building linear programs with ORTools Ross Hart Getting Started with Text Classification: Predict if Tweets are about Real Disasters Nabanita Roy
Friday October 29 10:00 AM Football Analytics Using Hierarchical Bayesian Models in PyMC Meenal Jhajharia From Jupyter Notebooks To JetBrains DataSpell Andrey Cheptsov

Ignite Sprint

Friday October 29 10:30 AM

Break

Break

Friday October 29 11:00 AM Cutting edge hyperparameter tuning made simple with Ray Tune Antoni Baum JupyterLite: Jupyter ❤️ WebAssembly ❤️ Python Jeremy Tuloup, Madhur Tandon, Martin Renou, Thorsten Beier

Break

Behind the Black Box: How to Understand Any ML Model Using SHAP Jonathan Bechtel
Friday October 29 11:30 AM Turning Pandas DataFrames to Semantic Knowledge Graph Cheuk Ting Ho Building a Sign-to-Speech prototype with TensorFlow and DeepStack: How it happened & What I learned Steven Kolawole sktime - A Unified Toolbox for Machine Learning with Time Series Markus Löning
Friday October 29 12:00 PM DedupliPy: a new deduplication package Frits Hermans Build polished, data-driven applications directly from your Pandas or XArray pipelines Philipp Rudiger
Friday October 29 12:30 PM Exploring Tools for Interpretable Machine Learning Juan Orduz Sliding into Causal Inference, with Python! Alon Nir
Friday October 29 1:00 PM

Social Hour in Gather

Friday October 29 2:00 PM

Lightning Talks Simona Maggio, Marco Edward Gorelli, Francesco Tisiot, James Laidler, Violeta Misheva, Alon Nir

Data Processing at Scale Benjamin Zaitlen, James Bourbeau, Martin Durant, Matthew Powers, Richard Zamora Introduction to Distance Metric Learning Dor Kedem
Friday October 29 3:00 PM From Jupyter Notebook to Production Web App, with Anvil and (only) Python Meredydd Luff Some Attention for Attenuation Bias Ruben Mak
Friday October 29 3:30 PM Packaging PyData for Enterprise Software Supply Chain (pre-recording) Zayd Ma, Hussain Sultan Let's Implement Bayesian Ordered Logistic Regression! Marco Edward Gorelli Computational Survival Analysis Allen Downey
Friday October 29 4:00 PM Simplification as a Service Vasu Sharma Law, Graphs & Python Adam Zadrożny Working with Data in a Connected World: the Power of Graph Data Science Clair J. Sullivan

Modin Sprint

Friday October 29 4:30 PM Deep learning-aided drug discovery Magdalena Wiercioch Wisdom of the Crowd: amplifying human intelligence with AI Cor Zuurmond
Friday October 29 5:00 PM Dev, Staging, and Production in Data Engineering with Terraform Sarah Krasnik Impact of Noisy Data on Support Vector Machine and Deep Learning Algorithms on Edge Computing Device Dr. Shrirang Ambaji Kulkarni Data and tools to model PV Systems Kevin Anderson, Mark Mikofski, Abhishek Parikh, Silvana Ovaitt
Friday October 29 5:30 PM Towards Cloud-Native Distributed Machine Learning Pipelines at Scale (pre-recorded) Yuan Tang Building a Data-Driven Product from Scratch, How Hard Can It Be? Adam Webber What's in your data: Data Profiler - An Open Source Solution to Explain Your Data Austin Walters, Jeremy Goodsitt

PyData Meetup Organisers Social

Friday October 29 6:00 PM

Break

Break

Friday October 29 6:30 PM Serving and Managing Reproducible Conda Environments via Conda-Store Chris Ostrouchov Serving BERT Models in Production with TorchServe Adway Dhillon, Nidhin Pattaniyil
Friday October 29 7:00 PM All you need is zarr.: Parallel access to remote HDF5, TIFF, grib2 and others. Martin Durant Why Datetimes Need Units: Avoiding a Y2262 Problem & Harnessing the Power of NumPy's datetime64 Christopher Ariza Bridging Data and Business: Power Plant Output Optimization Based on Electricity Market Price Sylvia Lee

Social Hour

Friday October 29 7:30 PM An Intro to Workflow Management with Prefect Kevin Kho Large Scale Data Validation with Fugue Han Wang
Friday October 29 8:00 PM Robust, End-to-end Online Machine Learning Applications with Flyte, Pandera and Streamlit Niels Bantilan Using a Pythonic Compass to Link the Physics Community to the Chemistry Community Suliman Sharif
Friday October 29 8:30 PM Sparcle: assigning transcripts to cells in multiplexed images Sandhya Prabhakaran Getting started with Dask using Saturn Cloud Mitali Sanwal
Friday October 29 9:00 PM Feature Stores: An operational bridge between machine learning models and data Jules S. Damji, Danny Chiao
Friday October 29 9:30 PM Data Engineering for successful Machine Learning Vini Jaiswal
Friday October 29 10:00 PM

General Sessions

  Talks I Talks II Workshop/Tutorial I Workshop/Tutorial II Community Events Sprints Sprints II
Saturday October 30 7:30 AM Towards Collaborative Reproducibility: Pinning Repository of Binary Distributions Nguyễn Gia Phong, Huy Ngo
Saturday October 30 8:00 AM Software inspired workflow for Data Analysis Imen Ayari
Saturday October 30 8:30 AM A first step from ad-hoc SQL to scalable ETL Martin Wanjiru
Saturday October 30 9:00 AM What could possibly go wrong when evaluating forecasts? Malte Tichy Large-Scale Production Reinforcement Learning with RLlib Sven Mika
Saturday October 30 9:30 AM Modeling aleatoric and epistemic uncertainty using Tensorflow and Tensorflow Probability Aleksander Molak Managing your data using FastAPI and Piccolo Admin Daniel Townsend
Saturday October 30 10:00 AM

Break

A Platform to Enable Data Science At Scale in Tesco Andrew Garrow, Benjamin Lehne

PyMC Sprint

Saturday October 30 10:30 AM How to Guarantee No One Understands What You Did in Your Machine Learning Project Jesper Dramsch Image(face) Classification with Computer Vision and Python MUSASIZI FRANCIS KAMANZI
Saturday October 30 11:00 AM

Lightning Talks Svea Marie Meyer, Guzal Bulatova, Sarah Schuhegger, Christian Juncker Brædstrup, Farooq Shaikh, Abhilash Babu, Brian Cechmanek, Diego Arenas, Mika Pflüger, Achilleas Koutsou, Michael Sonntag, Sebastian M. Ernst, Marek Suppa, John McCambridge

Break

So you wanna be a Pandas expert? | (Pre-recorded Tutorial) James Powell
Saturday October 30 11:30 AM Analyzing Company Filings for Stock Selection – a Practical Report Laura Jehl
Saturday October 30 12:00 PM The prototype hole and tools to help you out of it Irina Vidal Migallón

Break

Social Hour

Saturday October 30 12:30 PM

Break

How to detect silent model failures? Wojtek Kuberski
Saturday October 30 1:00 PM Computer vision and xAI: explaining a single prediction with visualisations and examples Sara Tähtinen

Break

So you wanna be a Pandas expert? | (Live Q&A) James Powell
Saturday October 30 1:30 PM Agile Data Science: How To Implement Agile Workflows For Analytics & Machine Learning John Sandall What do C-Suite Executives Pay Attention To? Dingqian (Sara) Liu
Saturday October 30 2:00 PM Extracting complements and substitutes from sales data - a network perspective Sebastian Lautz High Performance Python With Numba, Dask, and Rapids For the Absolute Beginner Gus Cavanaugh

Tech for Posterity. Challenges for the Future of AI Ethics and DEI Oyidiya Oji

NumPy + SciPy Sprint

Saturday October 30 2:30 PM

Break

Pub Quiz

Saturday October 30 3:00 PM Storytelling With Data – How To Turn a Basic Dataset Into a Compelling Story Meirav Ben Izhak Unlocking more from your Audio Data Braden Riggs
Saturday October 30 3:30 PM So Much Data, Such Poor Quality Temiloluwa Adeniyi
Saturday October 30 4:00 PM Snowflake & Dask: How to scale workloads using distributed fetch capabilities Miles Adkins, James Bourbeau, Mark Keller

Bokeh Sprint

Saturday October 30 4:30 PM

Break

Simplifying Testing of Spark Applications Megan Yow Uncertainty Quantification 360: A Hands-on Tutorial Prasanna Sattigeri, Jiri Navratil, Soumya Ghosh
Saturday October 30 5:00 PM An attempt at demystifying graph deep learning Eric Ma
Saturday October 30 5:30 PM

Break

Saturday October 30 6:00 PM

Keynote - Naomi Ceder

Saturday October 30 7:00 PM

Lightning Talks Banjo Obayomi, Tomas Capretto, Zachary Blackwood, Ryan Soklaski, Xiuwen Tu, Usman Kamran, Ardo Illaste, Jason Lee

Love your (data scientist) neighbour: Reproducible data science the Easydata way Amy Wooding
Saturday October 30 8:00 PM Extending Jupyter Data Visualizations Beyond the Notebook Seth Shelnutt Making the Perfect Cup of Joe: Active Preference Learning and Optimization Under Uncertainty Quan Nguyen

Social Hour

Saturday October 30 8:30 PM Introduction to Unsupervised and Semi-Supervised Learning in TensorFlow Andrew Shao What to do when you can't trust your labels? A practical approach Ramiro Caro
Saturday October 30 9:00 PM hydra-zen: Configurable, Reproducible, and Scalable Computing with Hydra Ryan Soklaski Fugue Tune: A Simple Interface for Distributed Hyperparmeter Optimization Jun Liu
Saturday October 30 9:30 PM Fusing economic survey datasets with the synthimpute Python package Max Ghenis From Jupyter to Production: Deploying an Influenza Monitoring System at Scale with Wearable Sensors Filip Jankovic
Saturday October 30 10:00 PM Using Reproducible Experiments To Create Better Machine Learning Models Milecia McGregor Foundational Infrastructure to Create a Successful Data Science Team Ethan Swan, Brad Boehmke, Gus Powers
Saturday October 30 10:30 PM