PyData Eindhoven 2020 - Presentation: Fighting COVID with Python

The Intensive Care Unit (ICU) is a very data rich environment. Vital signs, lab tests and medications of patients are recorded in high frequency and quality. Therefore the ICU is a good starting point for machine learning in health care. However when COVID-19 entered the stage, the data of the limited number of patients per individual hospitals was of little value, whereas individual doctors were in great need of insights on effective treatment. Therefore a unique collaboration was initiated: Dutch ICU's joined forces and shared data with each other to enable large-scale research on effective COVID treatment to answer questions like "when can mechanical ventilation safely be stopped" and "when should a patient be turned from his back to belly?"

Together with Amsterdam UMC, Pacmed transforms this data into a research data warehouse to facilitate international research. Having millions of rows per patient and weekly updates coming in from over thirty participating hospitals; distributing, scheduling and automating the data processing using for example AirFlow and Luigi was essential. The talk will show the data engineering architecture built and discuss the biggest challenges.

However by far the biggest challenge was homogenising the data. Thousands of items are registered at the ICU in high frequency, however different ICU's use different source systems, have different parameter naming and unit conventions but data quality and documentation is limited. The task of homogenising this data therefore was a huge effort. A big team of doctors was needed to (partly) manually map all data items to the right standardized name and check all results coming out of the data scientists' pipeline for all hospitals.

This talk discusses why intense collaboration between doctors and data scientists is needed when doing medical data science, how this collaboration was set up in the COVID project and how this collaboration together with better use of information standards can lead to large scale medical machine learning in the (post-)COVID era.

Wednesday Oct. 7, 2020, 4:30 p.m.–Oct. 7, 2020, 5 p.m. in Online

Fighting COVID with Python

Daan de Bruin

Description

Abstract

Wednesday Oct. 7, 2020, 4:30 p.m.–Oct. 7, 2020, 5 p.m. in Online

Fighting COVID with Python

Daan de Bruin

Description

Abstract

Subscribe to Receive PyData Updates