Wen Wang is a graduate student who is a member of the MSI research group of Professor Vipin Kumar (MSI Fellow and Head, Department of Computer Science and Engineering). Assistant Professor Chad Myers (Computer Science and Engineering) also advises her work. Professor Kumar specializes in the field of data mining, and Professor Myers is a computational biologist. Ms. Wang entered the graduate program in computer science at the University of Minnesota in 2009 and began using MSI for her research at about the same time. She was a finalist in the poster competition at the 2013 MSI Research Exhibition with her poster, “Leveraging Network Structure to Discover Genetic Interactions in Genome-Wide Association Studies.” Ms. Wang sat down with MSI recently to discuss her research and this poster.
MSI: What resources do you use at MSI?
Wen Wang: Some of my works is done on Professor Kumar’s proprietary machine, but most computational works for this project were done on Elmo. The software I use is MATLAB.
MSI: Let’s get into what your poster describes. This is related to genome-wide association studies, where you can look a couple of different genes and find differences in them?
WW: The purpose of our research project is to study the genetic causes of complex human diseases. The traditional methods used to analyze genome-wide associations (GWAS) data only test single genetic variation between patients and healthy subjects. GWAS data contains hundreds of thousands or even millions of genetic variables - single nucleotide polymorphism (SNP), and so this univariate analysis approach involves testing hundreds of thousands hypotheses. As a result, the statistical score (p-value) obtained needs to be corrected based on the number of hypotheses tested. Thus, to discover a single genetic variation with significant statistic power is a challenging task. In the past 10 years there have been about 1350 published GWAS studies and altogether these GWAS studies have successfully discovered more than 2000 loci which are significantly associated with one or more complex traits. However, these discovered genetic factors only can explain a very small amount of the heritability.
So maybe it’s not the single genetic variant that causes most of a disease. Instead, it could be the interaction of two genetic variants that brings more risk for a disease. However, to study pairwise genetic interactions is difficult since the test space is tremendous and at least half million samples are needed to achieve the statistical significance. This is not practicable and so it seems that this is a hopeless cause.
However, in the yeast research community, genetic interactions have been well studied. It has been proven that genetic interactions are more likely to happen between two pathways with redundant or complementary functions. So we were motivated to test the genetic interaction in the context of pathway-pathway interaction since many well-defined human pathways exist.
We developed a method that explicitly searches for such larger structures, guided by established sets of genes belonging to characterized pathways or gene modules. We applied this approach to a Parkinson's disease GWAS data and discovered tens of pathway-pathway interactions which are statistically significant. We also found biological evidence for many of these interactions. A significant fraction of them also can be validated in two independent cohorts.
MSI: How many subjects will you run the calculations for?
WW: The more the better. The data we tested has a number of subjects ranging from around 500 to 4,000.
MSI: So, something will get your attention if you see a lot more interactions than you expect?
WW: Yes. And we also did permutation tests to make sure it is significant.
MSI: You wrote your code in MATLAB and ran it on Elmo?
WW: Yes. We have different scenarios and different parameters to test. We like to run our experiments in parallel. Also, as you can tell, we’re dealing with big data and our approach needs lots of memory support. Elmo provided all that we need and allowed conducting the experiments in a much more efficient way compared to regular computers.
MSI: Yes, we sometimes have users who say they have programs that would take days on a desktop computer.
WW: Sometime even worse than that. It could be weeks or months. I try to make good use of MSI resources to get results as soon as possible.
MSI: Is this research basic science, or is there an immediate application?
WW: It’s kind of both. We study genetic interaction to help us understand how our biological system works - more specifically understand the underlying cause of disease. However, the ultimate goal of this research is to develop disease model which can be used for disease risk screening, and also to support the development of individualized medicine.
MSI: This research seems to be very collaborative among different disciplines, with data mining and computational biology.
WW: Absolutely! We have [Assistant Professor] Nathan Pankratz in Lab Medicine and Pathology, and [Professor] Brian Van Ness, in Genetics, Cell Biology, and Development involved in this project. We’re from computer science and we like to have experts from biology side to help us understand and interpret our discoveries.
Posted on December 11, 2013.