Conference Schedule

General Sessions

  Talks I Talks II Workshop/Tutorial I Workshop/Tutorial II
Thursday October 28 6:30 AM Reasoning with Natural Language Processing: advancement in the interpretation of Arabic speech Aseel Addawood
Thursday October 28 7:00 AM Anyone GAN do this: Solving the Minority Class Imbalance problem once and for all Dipam Paul, Alankrita Tewari, Dipam Paul
Thursday October 28 7:30 AM Compressive Sensing Iman Mossavat
Thursday October 28 8:00 AM Best Practices in Machine Learning Observability Ankit Rathi, Yatin Bhatia Visualizations for Privacy Preservation: The Balancing Act between Utility and Uncertainty Gatha Data Analysis with Pandas and NumPy Nimrita Koul
Thursday October 28 8:30 AM Notebook To Production Nir Barazida Wounds Over Time - Tracking Wound Healing via 3D Models Sivan Biham
Thursday October 28 9:00 AM Collaborative editing in Jupyter Notebook Kevin Jahns Is my data drifting? Early monitoring for machine learning models in production. Emeli Dral
Thursday October 28 9:30 AM Start Asking Your Data “Why?” - A Gentle Introduction To Causal Inference Eyal Kazin Components, Workflows, and Cookbooks - Building Medical Grade AI pipelines with Argo Workflows Omri Fima

Break

Thursday October 28 10:00 AM An analysis of Societal Bias in SOTA NLP Transfer Learning Benjamin Ajayi-Obe, David Hopes Keeping sensitive data safe using recommendation systems Liron Faybish (Ben-Kimon)
Thursday October 28 10:30 AM Classifying Documents on a Graph using GNNs Avi Aminov Interpretable ML models at scale Aishwarya Agrawal
Thursday October 28 11:00 AM Get to know Apache Kafka with Jupyter Notebooks Francesco Tisiot

Break

DVC Showcase – Who Moved My Data? Dean Pleban Analyzing gender based violence data with Python Ivana Feldfeber, Lucy Jiménez, VELEZ RUEDA ANA JULIA
Thursday October 28 11:30 AM

Break

Introducing Blosc2, the next generation of the Blosc compressor Francesc Alted
Thursday October 28 12:00 PM Risk at Scale - Running a large investment risk system and how risk analysis techniques can help you Barry Fitzgerald Python and Flutter application for Colouring and Enhancing Old Photos Utkarsh Mishra
Thursday October 28 12:30 PM Why *Interactive* Data Visualization Matters for Data Science in Python Nicolas Kruchten Highly-Scalable NLP to Answer Questions on South Africa’s COVID-19 WhatsApp Hotline Adam Chang Document your scientific project with Markdown, Sphinx, and Read the Docs Juan Luis Cano Rodríguez

Birds of a Feather (BOF) / extracurricular

Thursday October 28 1:00 PM Image classification in retail: Lessons from the real world Valentina Bono, Paul Klinger Deep Neural Deduplication Marcin Mosiolek
Thursday October 28 1:30 PM Dask: From POC to Production April Rathe Accelerating ML Inference at Scale with ONNX, Triton and Tempo Alejandro Saucedo
Thursday October 28 2:00 PM AIQC; deep learning experiment tracking with multi-dimensional pre/post-processing. Layne Sadler Computations as Assets - a New Approach to Reproducibility and Transparency Anders Berkeman, Carl Drougge, Sofia Hörberg Python Dashboarding Shootout and Showdown James A. Bednar, Adrien Treuille, Nicolas Kruchten, Philipp Rudiger, Sylvain Corlay
Thursday October 28 2:30 PM Polars, the fastest DataFrame library you never heard of. Ritchie Vink

Reserved

Know Your Data First: An Introduction to Exploratory Data Analysis Sin-seok SEO
Thursday October 28 3:00 PM Darts for Time Series Forecasting Julien Herzen, Francesco Lässig Functional, Composable, Asynchronous, Type-Safe Python Sune Debel
Thursday October 28 3:30 PM Data Science in the Enterprise: A Holistic Approach Gaby Lio Enterprise Machine Learning Pipelines with Unstructured Image Data Jacqueline Nolis, Chase Ginther
Thursday October 28 4:00 PM

Keynote - David Beazley

Thursday October 28 5:00 PM Profiling and Tuning PyTorch Models Shagun Sodhani Think Like Git Eli Sander GPU development with Python 101 Jacob Tomlinson Map Visualizations with Dash Leaflet Haw-minn Lu
Thursday October 28 5:30 PM Time: The most misunderstood dimension in data modelling Sergii Mikhtoniuk 5 Reasons Parquet files are better than CSV for data analyses Matthew Powers
Thursday October 28 6:00 PM Predictive modeling in a video advertising marketplace Olga Bane Effective Testing for Machine Learning Projects Eduardo Blancas
Thursday October 28 6:30 PM Innovating in the Oil & Gas Industry with AI/ML Hoda Rezaei - Energy Solution Lead at Integra conda-forge in 2021 Eric Dill Assessing and Mitigating Unfairness in AI Systems Manojit Nandi
Thursday October 28 7:00 PM

Reserved

Data infrastructure at the COVID Tracking Project Julia Kodysh

Pub Quiz

Thursday October 28 7:30 PM Bodo: Supercomputing-Like Performance and Scale for Python/Pandas Ehsan Totoni Scalable Sustainability with the Planetary Computer Tom Augspurger
Thursday October 28 8:00 PM

Lightning Talks Logan Kilpatrick, Braden Riggs, Brendan Collins, Jacob Zelko, John Cox, Nik Agarwal, Raghuram Thiagarajan, Neel Surya, Timothy Odom, Xu Chen, Rongpeng Li (Ron)

Thursday October 28 8:30 PM Building Responsible Data Science Workflows: Transparency, Reproducibility, and Ethics by Design Valentin Danchev, Ben Marwick, Dr. Brandeis Marshall (she/her), Sara Stoudt, Yacine Jernite
Thursday October 28 9:00 PM Deploying a Mobile App on Tensorflow: Lessons Learned Reshama Shaikh, Nidhin Pattaniyil Snowflake and Tecton: How to build production-ready machine learning pipelines Miles Adkins, Kevin Stumpf
Thursday October 28 9:30 PM Faceoff Fun with Python Frameworks: FastAPI vs Flask 2.0 Tonya Sims Unifying Large Scale Data Preprocessing and Machine Learning Pipelines Alex Wu
Thursday October 28 10:00 PM Graph Thinking Paco Nathan Lux: Automatic Visualizations for Exploratory Data Science Doris Jung-Lin Lee
Thursday October 28 10:30 PM Submodular optimization for minimizing redundancy in massive data sets Jacob Schreiber Makefiles: One great trick for making your conda environments more managable. Kjell Wooding
Thursday October 28 11:00 PM

General Sessions

  Talks I Talks II Workshop/Tutorial I Workshop/Tutorial II
Friday October 29 6:00 AM

Lightning Talks Miki Tebeka, Dana Averbuch, Jeremy John Selva, Pranav Kompally, hassaku, Meshva Patel, Ilana Tuil, Shivay Lamba, Evgeny Karev, Nicolo Musmeci

Friday October 29 7:30 AM FugueSQL - The Enhanced SQL Interface for Pandas, Spark, and Dask DataFrames Chengxuan Wang NLP and Hate speech: Why does it matter and what can we do? Smriti Singh Machine Learning Lifecycle Made Easy with MLflow Karishma Babbar, Kalyan Munjuluri
Friday October 29 8:00 AM Neural Prophet – A powerful AI framework for Time Series Models Kalyan Prasad Dask for Everyone Hugo Bowne-Anderson
Friday October 29 8:30 AM Designing Functional Data Pipelines for Reproducibility and Maintainability Chin Hwee Ong

Reserved

Friday October 29 9:00 AM Machine learning in health: Predicting pregnancy complications Oliver Rieger Counter Factual Analysis for Explainable AI Shashank Shekhar

Break

Friday October 29 9:30 AM Introduction to Quantum Computing with Python and Qiskit Vicente Ruben Del Pino Ruiz Lessons learned from deploying Machine Learning in an old-fashioned heavy industry Robert Meyer Building linear programs with ORTools Ross Hart Getting Started with Text Classification: Predict if Tweets are about Real Disasters Nabanita Roy
Friday October 29 10:00 AM Football Analytics Using Hierarchical Bayesian Models in PyMC3 Meenal Jhajharia From Jupyter Notebooks To JetBrains DataSpell Andrey Cheptsov
Friday October 29 10:30 AM

Break

Reserved

Friday October 29 11:00 AM Cutting edge hyperparameter tuning made simple with Ray Tune Antoni Baum JupyterLite: Jupyter ❤️ WebAssembly ❤️ Python Jeremy Tuloup, Madhur Tandon, Martin Renou

Break

Behind the Black Box: How to Understand Any ML Model Using SHAP Jonathan Bechtel
Friday October 29 11:30 AM Turning Pandas DataFrames to Semantic Knowledge Graph Cheuk Ting Ho Building a Sign-to-Speech prototype with TensorFlow and DeepStack: How it happened & What I learned Steven Kolawole sktime - A Unified Toolbox for Machine Learning with Time Series Markus Löning
Friday October 29 12:00 PM DedupliPy: a new deduplication package Frits Hermans Build polished, data-driven applications directly from your Pandas or XArray pipelines Philipp Rudiger
Friday October 29 12:30 PM Exploring Tools for Interpretable Machine Learning Juan Orduz Sliding into Causal Inference, with Python! Alon Nir
Friday October 29 1:00 PM

Keynote

Friday October 29 2:00 PM

Lightning Talks Simona Maggio, Marco Edward Gorelli, Francesco Tisiot, James Laidler, Violeta Misheva, Alon Nir

Data Processing at Scale Benjamin Zaitlen Introduction to Distance Metric Learning Dor Kedem
Friday October 29 3:00 PM From Jupyter Notebook to Production Web App, with Anvil and (only) Python Meredydd Luff Some Attention for Attenuation Bias Ruben Mak
Friday October 29 3:30 PM Packaging PyData for Enterprise Software Supply Chain Zayd Ma Let's Implement Bayesian Ordered Logistic Regression! Marco Edward Gorelli Computational Survival Analysis Allen Downey
Friday October 29 4:00 PM Simplification as a Service Vasu Sharma Law, Graphs & Python Adam Zadrożny Working with Data in a Connected World: the Power of Graph Data Science Clair J. Sullivan
Friday October 29 4:30 PM Deep learning-aided drug discovery Magdalena Wiercioch Wisdom of the Crowd: amplifying human intelligence with AI Cor Zuurmond
Friday October 29 5:00 PM Dev, Staging, and Production in Data Engineering with Terraform Sarah Krasnik Impact of Noisy Data on Support Vector Machine and Deep Learning Algorithms on Edge Computing Device Dr. Shrirang Ambaji Kulkarni Data and tools to model PV Systems Silvana Ovaitt
Friday October 29 5:30 PM Towards Cloud-Native Distributed Machine Learning Pipelines at Scale Yuan Tang

Reserved

What's in your data: Data Profiler - An Open Source Solution to Explain Your Data Anh Truong
Friday October 29 6:00 PM

Break

Break

Friday October 29 6:30 PM

Reserved

Reserved

Serving Pytorch Models in Production Adway Dhillon, Nidhin Pattaniyil
Friday October 29 7:00 PM All you need is zarr.: Parallel access to remote HDF5, TIFF, grib2 and others. Martin Durant Why Datetimes Need Units: Avoiding a Y2262 Problem & Harnessing the Power of NumPy's datetime64 Christopher Ariza Bridging Data and Business: Power Plant Output Optimization Based on Electricity Market Price Sylvia Lee
Friday October 29 7:30 PM An Intro to Workflow Management with Prefect Kevin Kho Large Scale Data Validation with Fugue Han Wang
Friday October 29 8:00 PM Robust, End-to-end Online Machine Learning Applications with Flytekit, Pandera and Streamlit Niels Bantilan

Reserved

Friday October 29 8:30 PM Sparcle: assigning transcripts to cells in multiplexed images Sandhya Prabhakaran Getting started with Dask using Saturn Cloud Jacqueline Nolis, Mitali Sanwal
Friday October 29 9:00 PM

Reserved

Feature Stores: An operational bridge between machine learning models and data Jules S. Damji, Danny Chiao
Friday October 29 9:30 PM

Reserved

Data Engineering for successful Machine Learning Vini Jaiswal
Friday October 29 10:00 PM

General Sessions

  Talks I Talks II Workshop/Tutorial I Workshop/Tutorial II
Saturday October 30 7:30 AM Towards Collaborative Reproducibility: Pinning Repository of Binary Distributions Nguyễn Gia Phong, Huy Ngo
Saturday October 30 8:00 AM Software inspired workflow for Data Analysis Imen Ayari
Saturday October 30 8:30 AM A first step from ad-hoc SQL to scalable ETL Martin Wanjiru
Saturday October 30 9:00 AM What could possibly go wrong when evaluating forecasts? Malte Tichy Large-Scale Production Reinforcement Learning with RLlib Sven Mika
Saturday October 30 9:30 AM Modeling aleatoric and epistemic uncertainty using Tensorflow and Tensorflow Probability Aleksander Molak Managing your data using FastAPI and Piccolo Admin Daniel Townsend
Saturday October 30 10:00 AM

Reserved

A Platform to Enable Data Science At Scale in Tesco Andrew Garrow, Benjamin Lehne
Saturday October 30 10:30 AM

Break

How to Guarantee No One Understands What You Did in Your Machine Learning Project Jesper Dramsch Image(face) Classification with Computer Vision and Python MUSASIZI FRANCIS KAMANZI
Saturday October 30 11:00 AM

Lightning Talks Christopher Lozinski, Svea Marie Meyer, Guzal Bulatova, Sarah Schuhegger, Christian Juncker Brædstrup, Farooq Shaikh, Abhilash Babu, Brian Cechmanek, Diego Arenas, Mika Pflüger, Achilleas Koutsou, Michael Sonntag, Sebastian M. Ernst, Marek Suppa, John McCambridge

Break

Saturday October 30 11:30 AM Analyzing Company Filings for Stock Selection – a Practical Report Laura Jehl Knowledge graph data modelling with TerminusDB Cheuk Ting Ho
Saturday October 30 12:00 PM The prototype hole and tools to help you out of it Irina Vidal Migallón

Break

Saturday October 30 12:30 PM

Break

Why do you need to start monitoring ML right now? Wojtek Kuberski
Saturday October 30 1:00 PM Computer vision and xAI: explaining a single prediction with visualisations and examples Sara Tähtinen TBA James Powell
Saturday October 30 1:30 PM Agile Data Science: How To Implement Agile Workflows For Analytics & Machine Learning John Sandall What do C-Suite Executives Pay Attention To? Dingqian (Sara) Liu

Break

Saturday October 30 2:00 PM Extracting complements and substitutes from sales data - a network perspective Sebastian Lautz High Performance Python With Numba, Dask, and Rapids For the Absolute Beginner Gus Cavanaugh

Tech for Posterity. Challenges for the Future of AI Ethics and DEI Oyidiya Oji

Saturday October 30 2:30 PM

Pub Quiz

Break

Saturday October 30 3:00 PM Storytelling With Data – How to Turn a Basic Dataset Into a Compelling Story Meirav Ben Izhak
Saturday October 30 3:30 PM So Much Data, Such Poor Quality Temiloluwa Adeniyi
Saturday October 30 4:00 PM Snowflake & Dask: How to scale workloads using distributed fetch capabilities Miles Adkins, James Bourbeau, Mark Keller
Saturday October 30 4:30 PM

Reserved

Simplifying Testing of Spark Applications Megan Yow, Kevin Kho Uncertainty Quantification 360: A Hands-on Tutorial Prasanna Sattigeri, Jiri Navratil, Soumya Ghosh
Saturday October 30 5:00 PM An attempt at demystifying graph deep learning Eric Ma
Saturday October 30 5:30 PM

Reserved

Reserved

Saturday October 30 6:00 PM

Keynote - Naomi Ceder

Saturday October 30 7:00 PM

Lightning Talks Banjo Obayomi, Tomas Capretto, Zachary Blackwood, Ryan Soklaski, Xiuwen Tu, Usman Kamran, Ardo Illaste, Jason Lee

Love your (data scientist) neighbour: Reproducible data science the Easydata way Amy Wooding
Saturday October 30 8:00 PM Extending Jupyter Data Visualizations Beyond the Notebook Seth Shelnutt Making the Perfect Cup of Joe: Active Preference Learning and Optimization Under Uncertainty Quan Nguyen
Saturday October 30 8:30 PM Introduction to Unsupervised and Semi-Supervised Learning in TensorFlow Andrew Shao What to do when you can't trust your labels? A practical approach Ramiro Caro
Saturday October 30 9:00 PM hydra-zen: Configurable, Reproducible, and Scalable Computing with Hydra Ryan Soklaski Fugue Tune: A Simple Interface for Distributed Hyperparmeter Optimization Jun Liu
Saturday October 30 9:30 PM Fusing economic survey datasets with the synthimpute Python package Max Ghenis From Jupyter to Production: Deploying an Influenza Monitoring System at Scale with Wearable Sensors Filip Jankovic
Saturday October 30 10:00 PM Using Reproducible Experiments To Create Better Machine Learning Models Milecia McGregor Foundational Infrastructure to Create a Successful Data Science Team Ethan Swan, Bradley Boehmke, Gus Powers
Saturday October 30 10:30 PM