Project abstract for group srivbane

Large Scale Machine Learning and Its Applications

This group works on large scale machine learning and data analysis, applied to the problems of climate prediction, anomaly detection and recommendation systems. Each of these problems involves large number of computations as they search through piles of explicit and implicit information, either observed or unobserved. The researchers are working on three projects during 2016, all of which require HPC resources.

  • Anomaly detection: This project is concerned with the analysis of a large flight dataset to discover anomalous aviation situations. The dataset contains about 180,000 flights, each consisting of 186 time-series of various lengths.
  • Recommendation system: The objective of this project is to build a model for news article recommendation. The accuracy of the ranked list of recommended items is expected to be computed in large scale datasets, where at least millions of observations (users who rated/clicked on an article) is given. Although online models will be tested, the researchers also need to compare them with offline models, for which all training data will be needed.
  • Deep learning methods for climate science: This project will train deep networks for prediction tasks on Global Climate Model (GCM) climate datasets. The output of all GCM models combined consist of around 50,000 observations, each of which has 10,000-60,000 dimensions of observations of various climatic parameters like temperature, precipitation etc. The dataset is to be used for two prediction tasks: prediction of Indian summer monsoon rainfall, and prediction of air temperature on nine land locations in different parts of the world. Each of these areas will explore the use of deep learning models like convolutional nets, recurrent nets, restricted Boltzmann machines, auto-encoders, etc. with each model having many parameters to be trained. The researchers believe that this is one of the first applications proposing using deep networks on GCM datasets and as such will require running multiple iterations of these models for tuning and testing.

Return to this PI's main page.