Professor Yuhong Yang

CLA Statistics, School of
College of Liberal Arts
Twin Cities
Project Title: 
Big Data, Model Combining, and Predictive Modeling of High Dimensional Data

Big data and the predictive modeling of high-dimensional datasets are of great interest to practitioners in many fields, such as finance, biology, and economics. These researchers are taking a methodology, model combination, that is widely and efficiently used for low-dimensional datasets and adapting it for high-dimensional situations. However, little literature has discussed the combination of models for high-dimensional datasets. This project will develop a general risk bound for the proposed methodology for high-dimensional predictive modeling, especially classification problems. Further, an efficient computing algorithm for combination schemes will be developed and wrapped into a publicly available R package.

Many big-data sets (real data) will be analyzed by multiple high-dimensional classification methods using cross-validation. This process will take about 10 million non-linear numerical optimizations. Besides working with real data, the researchers will perform various numerical experiments in order to have a better understanding of their methods. For different scenarios, they will compare their methods with between five and ten other popular methods and run large number of replicates to reduce the bias from the samplings. This will take about 10 million calculations.

Project Investigators

Yuchen Chen
Henry Wyneken
Professor Yuhong Yang
Yanjia Yu
Lin Zhang
Are you a member of this group? Log in to see more information.