Mining Earth Science Data and Biomedical Data


Mining Earth Science and Biomedical Data

The primary objective of this research is to develop novel, high-performance data-mining algorithms and tools for mining large-scale datasets that arise in a variety of applications. Some examples are gigabyte datasets collected by earth-observing satellites that must be processed to better understand global scale changes in biosphere processes and patterns, data generated by scientific simulations that can be used to gain insight into the underlying physical processes, data obtained through monitoring network traffic to detect illegal network activities, and large collections of text and hypertext analyzed to extract relevant information. The key technical challenges in mining these datasets include: high volume, dimensionality, and heterogeneity; the spatio-temporal aspect of the data; possible skewed class distribution; the distributed nature of the data; and complexity in converting raw collected data into high level features. High-performance data mining is essential to analyze the growing data and provide analysts with automated tools that facilitate some of the steps needed for hypothesis generation and evaluation.

Data mining has also become a key tool for analyzing biomedical data. In collaboration with the Mayo Clinic of Rochester, Minnesota, these researchers are developing advanced data-mining techniques for several medical problems, such as early prediction of liver fibrosis to significantly reduce the need for invasive laboratory tests and liver biopsy. Since data mining has established itself as an effective methodology for the analysis of large amounts of biological data, the researchers are also trying to identify its impact on the automatic prediction of protein function from proteomics data, genetic and genomic marker discovery from SNP and gene-expression data, and next-generation sequencing data. Computational challenges imposed by the large size of the datasets will be addressed by building upon past research in highly parallel formulations of key data-mining kernels for anomaly/outlier detection, finding association patterns, clustering, and building rare-class predictive models that can take advantage of high-performance computers. 

This group has also been performing biomedical research in collaboration with Professors Angus MacDonald (Psychology) and Kelvin Lim (Psychiatry) on projects analyzing MRI and fMRI data for subjects with schizophrenia, biopolar disease, and other mental disorders. 

A bibliography of this group’s publications acknowledging MSI is attached.

Group name: