Wednesday Oct. 7, 2020, 4:30 p.m.–Oct. 7, 2020, 5 p.m. in Online

Fighting COVID with Python

Daan de Bruin

Audience level:
Novice

Description

Dutch ICU's share their data in a unique large-scale research on COVID treatment. Together with Amsterdam UMC, Pacmed processes this data, facilitating research. This talk discusses both the engineering required to handle the stream of ICU data, how a big team of doctors and data scientists together homogenised the highly heterogenous data, and how ML can help improving COVID treatments.

Abstract

The Intensive Care Unit (ICU) is a very data rich environment. Vital signs, lab tests and medications of patients are recorded in high frequency and quality. Therefore the ICU is a good starting point for machine learning in health care. However when COVID-19 entered the stage, the data of the limited number of patients per individual hospitals was of little value, whereas individual doctors were in great need of insights on effective treatment. Therefore a unique collaboration was initiated: Dutch ICU's joined forces and shared data with each other to enable large-scale research on effective COVID treatment to answer questions like "when can mechanical ventilation safely be stopped" and "when should a patient be turned from his back to belly?"

Together with Amsterdam UMC, Pacmed transforms this data into a research data warehouse to facilitate international research. Having millions of rows per patient and weekly updates coming in from over thirty participating hospitals; distributing, scheduling and automating the data processing using for example AirFlow and Luigi was essential. The talk will show the data engineering architecture built and discuss the biggest challenges.

However by far the biggest challenge was homogenising the data. Thousands of items are registered at the ICU in high frequency, however different ICU's use different source systems, have different parameter naming and unit conventions but data quality and documentation is limited. The task of homogenising this data therefore was a huge effort. A big team of doctors was needed to (partly) manually map all data items to the right standardized name and check all results coming out of the data scientists' pipeline for all hospitals.

This talk discusses why intense collaboration between doctors and data scientists is needed when doing medical data science, how this collaboration was set up in the COVID project and how this collaboration together with better use of information standards can lead to large scale medical machine learning in the (post-)COVID era.

Subscribe to Receive PyData Updates

Subscribe