Research Abstracts Online
January 2010 - March 2011
University of Minnesota Twin Cities
PI: LiLian Yuan
Support Vector Regression Approach to Capturing Peptide Sequence Characteristics; Developing Improved miRNA Target Prediction Tools; Improved Methods to Analyze RNA-seq Data
These researchers are involved in three projects using MSI resources. The first concerns determining the binding or interacting sites of a protein, which are the peptide elements that play key roles in binding to or interacting with another protein. A practical difficulty in large-scale peptide screening efforts is the "exponential exploding” problem, that is, the number of possible peptide sequences grows exponentially with the length of the sequence. These researchers are taking an in silico approach to this problem, in which a Support Vector Machine Regression (SVR) method is adopted to quantitatively model the relationship between the peptide sequences and the binding intensities.
The second project is an investigation into microRNAs (miRNAs), a class of newly discovered genes capable of post-transcriptionally regulating the expression of other genes (their "targets”), by binding to the non-coding regions of those genes, leading to cleavage of transcripts and/or repression of translation. Despite many efforts paid by several research groups, the mechanism of the miRNA targeting remains elusive. The group has started a collaborative project with two groups of researchers at Stanford, who have provided two sets of AGO2 immunoprecipitation microarray datasets and two sets of gene expression microarray datasets. These high-quality datasets offer a unique opportunity for elucidating the miRNA targeting mechanism. The researchers are applying a strategy where configurations for a dynamic programming-based scoring scheme are randomly created and summarized with supervised and unsupervised machine learning-based analysis methods, for the purpose of achieving accurate miRNA targeting criteria with much improved coverage than existing methods.
The third project consists of two parts. In the first, the researchers are developing improved methods of mapping and quantifying RNA-seq data. RNA-seq is a new technology and existing methods are far from perfect. The group is developing improved methods that aim to increase the accuracy in read mapping, including the mapping with exon-exon junctions, and increase the coverage/completeness of mapping, i.e., accounting for higher percentage of the reads than existing, published methods. The second part of the project uses methods developed in the first part to map all (or most) public available human RNA-seq datasets (around 30 of them), to create a close-to-complete transcriptome resource, based on which they will: investigate the completeness of existing gene model databases and estimate the number of unannotated genes; predict coding potential of novel genes, predict functions of coding genes, and assess evolutionary conservation of non-coding genes; and investigate complexity of alternative splicing and alternative transcription start site/stop site usage.
Tongbin Li, Faculty Collaborator
Marc A. Parent, Graduate Student
Rendong Yang, Collaborator
Liangsheng Zhang, Collaborator
Dihan Zhou, Collaborator