DNA sequencing allows researchers to compare gene expression in different conditions for any species, thus being able to find groups of genes related to economically or ecologically important traits. In this talk I will present a new Python pipeline aimed at improving some typical problems faced by working with sequences from non-model species, in my case, the southern beech Nothofagus pumilio.
Nothofagus pumilio (common name: Lenga) is the most abundant tree species of the template Patagonian forest, the southernmost woody ecosystem on Earth. Despite its ecological and economical importance, physiological and molecular studies in N. pumilio are scarce, and the genomic resources for the whole genus are very limited. Given our interest in the overall effect of high temperatures in N. pumilio’s physiology in a context of climate change, we sequenced the transcriptome (i.e., the collection of all the genes expressed in a given condition) from plants exposed to contrasting temperatures (20° and 34°C) in a growth chamber controlled environment. Two biological replicates from independent experiments were sequenced for each temperature. The non-model nature of this species implied a de novo assembly and all downstream analysis with no reference genome. In this talk I present some problems of great interest to the non-model plant species community and how we managed to overcome them with a hands-on programming approach. I will show an original bioinformatic pipeline that integrates de novo transcriptome assembly and mapping, with both standard and new tools developed in our lab aimed to solve basic problems faced by working without a reference genome, such as mis-assembled contigs, sequence redundancy and strand specificity.