UMSI 2000 Annual Report: Vipin Kumar, Fellow Previous Page  |  Table of Contents  |  Next Page

Vipin Kumar, Fellow


Discovery of Patterns in Very Large Dimension Data Sets

Research Group

Samar Choudhary, Graduate Student Researcher
Ananth Y. Grama, Purdue University, West Lafayette, Indiana
Anshul Gupta, IBM T.J. Watson Research Center, Yorktown Heights, New York
Ewi-Hong (Sam) Han, Graduate Student Researcher
Mahesh V. Joshi, Graduate Student Researcher
Yun-Jae Jung, Graduate Student Researcher
Sushrut Karanjkar, Graduate Student Researcher
George Karypis, Faculty Collaborator
Sreenivas Mahesh Kumar, Graduate Student Researcher
Bill Leinberger, Graduate Student Researcher
Tom Nurkkala, Graduate Student Researcher
Uygar Oztekin, Graduate Student Researcher
Sanjay Ranka, Department of CIS, University of Florida, Gainesville, Florida
Kirk A. Schloegel, Graduate Student Researcher
Elizabeth Shoop, Research Associate
Michael S. Steinbach, Graduate Student Researcher
Kapil L. Surlaker, Graduate Student Researcher
Pang Tan, Graduate Student Researcher
Alex Zhang, Graduate Student Researcher


1999 UMSI Publications
99/31
"A High Performance Two Dimensional Scalable Parallel Algorithm for Solving Sparse Triangular Systems," M.V. Joshi, A.M. Gupta, G. Karypis, and V. Kumar, in Proceedings of Fourth International Conference on High Performance Computing (HiPC'97), 1997.
99/32
"Repartitioning of Adaptive Meshes: Experiments with Multilevel Diffusion," K. Schloegel, G. Karypis, and V. Kumar, in Proceedings of Third International Euro-Par Conference, 1997.
99/33
"Parallel Formulations of Decision-Tree Classification Algorithms," A.M. Srivastava, E. Han, V. Kumar, and V. Singh, Applications and Systems, 3, p. 237 (1999).
99/34
"Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes," K. Schloegel, G. Karypis, and V. Kumar, Journal of Parallel and Distributed Computing, 47, p. 109 (1997).
A complete Bibliography can be found on the Internet at:
www.msi.umn.edu/cgi-bin/reports/searchv2.html

Discovery of patterns in very large dimensional data sets is of great interest in many data mining applications. These researchers are currently developing models and algorithms for discovering clustering and classification patterns in very large dimension data sets. The relationship present in the original data in high-dimensional space is mapped into a hypergraph. A hyperedge represents a relationship (affinity) among subsets of data and the weight of the hyperedge reflects the strength of this affinity.

A hypergraph partitioning algorithm, HMETIS, is used to find a partitioning of the vertices, which correspond to clusters of data items. Clustering experiments on S&P500 stock data, protein coding data, and Internet document data show that this approach performed much better than traditional schemes for high-dimensional data sets in terms of quality of clusters obtained and runtime.

The researchers are also building a classification method based upon the hypergraph model that puts a new data item to the class of vertices that are closely connected to the new data items in the hypergraph. Experiments on a variety of documents obtained from the Internet and the Reuters data set show that the classifier based on hypergraph models outperforms decision tree based classifiers and Bayesian classifiers in therms of classification accuracy.


Previous Page  |  Table of Contents  |  Next Page