Project abstract for group yangyh

Big Data, Model Combining, and Predictive Modeling of High Dimensional Data

Big Data and the predictive modeling of high-dimensional datasets are of great interest to practitioners in many fields, such as finance, biology, and economics. These researchers are taking a methodology, model combination, that is widely and efficiently used for low-dimensional datasets and adapting it for high-dimensional situations. The project will develop a general risk bound for the group's methodology for high-dimensional predictive modeling, especially classification problems. Further, an efficient computing algorithm for the combination schemes will be developed and wrapped into a publicly available R package.

Many Big-Data sets (real data) will be analyzed by multiple high-dimensional classification methods using cross-validation. It will take about 10 million non-linear numerical optimizations for process. Besides working with real data, the researchers will perform various numerical experiments in order to have a better understanding of their methods. For different scenarios, they will compare their methods with between five and ten other popular methods and run large number of replicates to reduce the bias from the samplings. This will take about 10 million calculations.

Return to this PI's main page.