Thursday Oct. 8, 2020, 1:30 p.m.–Oct. 8, 2020, 2 p.m. in Online

The industrial challenge of missing data

Reza Sahraeian

Audience level:
Intermediate

Description

Missing data is a common problem in many application. This talk aims at providing a brief overview on the origin of data missingness as well as covering effective approaches to handle that via some practical examples showing how to use and implement missing data techniques using open source python libraries. This talk suits data and machine learning scientists both in industry and academy.

Abstract

Artificial intelligence owes its revolutionary rise in many application not only to the smart algorithms and high computational resources but also the availability of more and more data. However, gathering data is not always trivial and depending on the task, data is incomplete due to several reasons such as privacy, machine or human mistake and cost of measurement; this missingness limits the usage of many AI algorithms and may hamper the performance if not properly addressed. This talk aims at: 1) providing a brief overview on the origin of data missingness 2) categorizing the approaches to handle missing data 3) providing some practical examples how to use and implement missing data techniques, from simple to complicated ones, via open source python libraries. In this talk I try to be more intuitive and practical with minimum math while making sure the concept is well explained. This talk suits data and machine learning scientists both in industry and academy. While the audience with knowledge of python and machine learning can benefit the most from the talk, it is expected that a novice also follows a big part of the talk. In the end, the audience will be familiar with the missing data problem, a family of techniques as well as some open toolboxes to address this problem.

Subscribe to Receive PyData Updates

Subscribe