The stakes are high for using machine learning to assist doctors at the intensive care. Therefore, interpretability of the model is pivotal to understand predictions and place them in the right medical context. This talk will discuss an advanced interpretability layer based on SHAP that provides that. Learnings from user studies and examples through a software demo will be presented.
In the intensive care unit (ICU), critically ill patients are stabilised, receive life-support and are being monitored continuously. The decision whether a patient is ready for discharge from the ICU is a challenging one: up to 8% of discharges lead to unexpected mortality or readmission to the ICU, often in a worse condition than before the first discharge. By optimising the timing of discharge from the ICU, unexpected readmissions may be prevented and the post-ICU mortality rate may be reduced.
The large amount of data on vital signs, lab values and clinical observations continuously collected in the ICU allows for the development and implementation of advanced decision support tools. Together with Amsterdam UMC, a gradient boosting model has been developed that predicts readmission risk based on thousands of features with an AUC of 0.82. Pacmed has developed software based on this model that supports doctors in determining which patients are safe to be transferred from the ICU to further recover elsewhere in the hospital.
Since the stakes are high in medical decision making and a prediction model is inherently imperfect, models should be used as decision-supporting instead of decision-making tools. Therefore, interpretability of the model by doctors is pivotal to understand and decide to what extent they will allow the model to influence their decision making. But what is interpretability? And what is the most effective way to achieve sufficient understanding of doctors to be able to validly use a machine learning application in practice?
We believe interpretability means allowing doctors to understand predictions, place them in the appropriate context and adequately assess the value and reliability of individual predictions compared to their clinical intuition. How to best achieve this? Interpretability comes in many forms: intrinsic and post-hoc, model-specific and model-agnostic, global and local.
This talk discusses the way we are trying to achieve this for the ICU use case, where many complex features contribute to the model’s prediction. After careful selection of clinically relevant patient characteristics only, and the engineering of interpretable features from time series of signals we use SHAP [1] to open the black box of the Gradient Boosting classifier to the ICU doctor. We go beyond SHAP by building an interpretability layer in the front-end of the software explaining the individual predictions to the ICU doctor.
User studies with over twenty ICU doctors from three hospitals, taught us that interpretability is not easily achieved. Even the most data-savvy doctor struggles with the output of machine learning, misinterpreting features or mistaking correlation for causation. Model interpretations that seemed clear and useful to data scientists, often introduced more noise than signal to the doctor.
We will summarise this process and discuss the most important learnings and pitfalls. We will use a software demo for providing concrete and realistic examples. This software-solution will be implemented at the Amsterdam UMC and the model and it's interpretability will be validated at over five hospitals this year. We hope to have an open discussion with the audience about ideas for the further advancement of interpretability for machine learning in health care.
[1]. S. Lundberg, S.I. Lee. A unified approach to interpreting model predictions. arXiv 2018; https://arxiv.org/abs/1705.07874