Machine learning plays an important role in cancer research. In this talk, we’ll tackle the challenge of predicting which patients are likely to respond to given anti-cancer treatments. In doing so, we’ll show how tools such as Snakemake/Bioconda can be used to create reproducible workflows and illustrate the challenges of interpreting predictive models in large, highly-correlated feature spaces.
An important topic in cancer research is identifying which patients will respond to which anti-cancer treatments. Much of this research is guided by computational analyses: from the use of sequence alignment tools for identifying mutations, to the use of machine learning for finding associations between mutations and drug treatments. In this talk, we will give an introduction to these concepts and demonstrate a workflow for predicting drug responses.
In the first part of the talk, we will describe the basics of (tumor) cell sequencing and the different computational steps involved in the analysis of different types of sequencing data. Besides this, we demonstrate how Python tools such as Snakemake and Bioconda can be used to create reproducible, code-based workflows, which can easily be shared with others and used as building-blocks in other, more complex workflows.
In the second part of our talk, we will discuss the role of machine learning approaches in analyzing these sequencing datasets, focussing on the specific case of predicting drug responses and interpreting the produced models to gain biological insights. In particular, we will focus on the challenges of identifying predictive features when integrating multiple, highly-correlated datasets. To this end, we will close by discussing TANDEM, a two-stage elastic net regression model that we specifically developed to address these challenges.