Developing Improved miRNA Target Prediction Tools; Improved Methods to Analyze RNA-seq Data
These researchers are involved in two projects using MSI resources. The first is an investigation into microRNAs (miRNAs), a class of newly discovered genes capable of post-transcriptionally regulating the expression of other genes (their “targets”), by binding to the non-coding regions of those genes, leading to cleavage of transcripts and/or repression of translation. Despite many efforts paid by several research groups, the mechanism of the miRNA targeting remains elusive. The group has started a collaborative project with two groups of researchers at Stanford, who have provided two sets of AGO2 immunoprecipitation microarray datasets and two sets of gene expression microarray datasets. These high-quality datasets offer a unique opportunity for elucidating the miRNA targeting mechanism. The researchers are applying a strategy where configurations for a dynamic programming-based scoring scheme are randomly created and summarized with supervised and unsupervised machine learning-based analysis methods, for the purpose of achieving accurate miRNA targeting criteria with much improved coverage than existing methods.
The second project consists of two parts. In the first, the researchers are developing improved methods of mapping and quantifying RNA-seq data. RNA-seq is a new technology and existing methods are far from perfect. The group is developing improved methods that aim to increase the accuracy in read mapping, including the mapping with exon-exon junctions, and increase the coverage/completeness of mapping, i.e., accounting for higher percentage of the reads than existing, published methods. The second part of the project uses methods developed in the first part to map all (or most) public available human RNA-seq datasets (around 30 of them), to create a close-to-complete transcriptome resource, based on which they will: investigate the completeness of existing gene model databases and estimate the number of unannotated genes; predict coding potential of novel genes, predict functions of coding genes, and assess evolutionary conservation of non-coding genes; and investigate complexity of alternative splicing and alternative transcription start site/stop site usage.