Professor George Karypis

CSENG Computer Science & Eng
College of Science & Engineering
Twin Cities
Project Title: 
High-Performance and Big Data Research

This group's research spans the fields of high-performance computing, graph learning, natural language processing, recommender systems, and learning analytics.

  • Research in the area of high-performance computing focuses on the design and implementation of scalable algorithms to tackle the bottleneck of training machine learning models on large scale datasets that arise in real-world applications. The challenge is resolved in two ways: partition and distribute large datasets to multiple computation clusters for distributed training; and reduce the memory cost of the model while maintaining performance. The group develops tools for distributed graph partitioning and memory efficient graph neural networks. These tools are used extensively for training graph neural networks on real-world graphs with billions of nodes and edges.
  • Research in graph learning falls into four categories: unsupervised graph representation learning; knowledge graph-based question answering; graph neural networks (GNNs) on heterogeneous graphs; and application of graph neural networks in computational chemistry and materials simulation. Unsupervised graph representation learning aims to effectively encode topological structure of graphs as well as node/edge features into node embeddings and graph embeddings for downstream graph-related applications. Knowledge graph-based question answering aims to extract information from knowledge graphs to answer questions that are in the form of natural languages. GNNs on heterogeneous graphs aim to capture information that is multi-hops away in a heterogeneous graph while avoiding the over-smoothing problem. For computational chemistry, GNNs are used to learn complex structures of molecules to predict their quantum properties. This group's method achieves state-of-the-art performance and can help accelerate molecule screening and drug discovery.

  • Natural language processing research focuses on distant supervised methods (DSM) and pre-trained language models (PLM), with applications in text-segmentation, information retrieval, and citation analysis. The researchers target the low-data regime in natural language processing tasks where annotated data are expensive and difficult to collect. DSMs train machine learning models with distant supervisions (e.g., bag-of-labels associated with a group of data points) instead of explicitly labeled data. PLMs are pre-trained via self-supervision tasks (e.g., masked language modeling) on a large corpus of unlabeled text and adapt to target tasks via fine-tuning on the corresponding labeled datasets. Both methods are designed to reduce the reliance on annotated data for training models and demonstrate their effectiveness in the low-data regime.

  • Research in the area of recommender systems focuses on the design and development of methods to improve the quality of recommendations served to users of the system. After thoroughly exploring large-scale datasets the researchers have identified certain fundamental characteristics that affect the performance of existing recommendation schemes. This has led to the development of new Top-N recommendation methods that outperform the state-of-the-art while also being efficient and readily applicable in large-scale settings. In the era of deep learning, they push Top-N recommendation performance even further by leveraging cutting-edge deep learning methods for better user-modeling and item representation learning.
  • Research in the area of learning analytics focuses on the development of predictive models for estimating the performance (i.e., grades) of students on future courses, ranking models for Top-N course recommendation, identifying enrollment patterns that are associated with course success or failure, and studying curriculum planning in terms of course timing and ordering and how it relates to the student’s academic performance and graduation time. These models aim to help students make informed decisions about which courses to register for and help them with course sequencing, which can improve student retention and lead to successful and timely graduation. The researchers also consider fairness in the algorithms to ensure students in protected groups receive fair predictions.

Project Investigators

Maria Kalantzi
Professor George Karypis
Petros Karypis
Konstantinos Mavromatis
Agoritsa Polyzou
Zeren Shui
Ancy Tom
Are you a member of this group? Log in to see more information.