Optimization of an RNA-Seq Hybrid Pipeline
RNA-Seq is an important technique that relies on next-generation sequencing to study tumor cells, in particular to identify gene and transcript expression. The procedure to carry out this type of study requires the use of pipelines, which involve starting with fragments of cDNA obtained from the sequencing machine and running them through a series of applications or genomics tools until results such as alignment, visualization, expression, differential expression, isoforms identification, and alternative expression are obtained. Pipeline optimization is important, not only in terms of accuracy but in terms of time to solution. It has been shown that a single splice aligner many times is not sufficient to study certain regions of the transcriptome.
This project involves the development of an ensemble approach to better capture splicing. In addition, this pipeline can be defined as a “hybrid” pipeline because it complements a traditional pipeline based on splice aligners with a de novo assembler. The use of a de novo assembler helps provide and identify novel transcripts. De novo assemblers have the advantage that they do not require a reference genome. The researchers use a Trinity RNA-Seq de novo assembler developed at the Broad Institute of Harvard and MIT. In addition to proposing optimizing accuracy, they also look at time to solution. To this end, the researchers have started parallelizing Trinity to efficiently use multiple nodes on any standard Linux cluster.
Return to this PI's main page.