College of Science & Engineering
Twin Cities
These researchers are using MSI for six projects:
-
Identification of aberration patterns (i.e., behavior showing significant deviation from the normal) using a large number of multi-attribute trajectories. Identifying such aberration patterns can help improve maritime security and prevent illicit activities (e.g., illegal fishing, illegal oil transfer to violate United Nations sanctions) where the involved objects may hide their movement by deliberately not reporting their locations. The project uses multi-attribute trajectory data (MTD), which consists of various attributes (e.g., drought, rate of turn, emission). The challenges of this problem arise from the complexity of modeling gaps and large amounts of data. The project includes two computationally intensive tasks that require MSI resources:
-
Detecting potential rendezvous patterns via Spatiotemporal (ST) joins based on two or more modeled gaps, resulting in high exponential gap enumeration costs for millions of trajectory gaps.
-
Conducting data slicing and refinement of the resultant shape derived from the intersection of ST joins at a finer granular level providing a relatively tighter ST bound and, thereby, reducing the manual post processing effort done by human analysts.
-
-
Identification of spatial patterns (e.g., colocation) of cell interactions (e.g., immune and tumor) to help distinguish between responder and non-responder tissue samples (e.g., clinical outcome to immunotherapy). Most of the related works to identify similar spatial patterns are limited to hand-constructed features using traditional spatial association measures (e.g., Ripley’s cross-k, G-cross, etc). However, these may not be sufficient in capturing the relevant measures of spatial interactions (e.g., directional spatial relationships) among tumor and immune cells such as surrounded. To overcome these limitations, these researchers have been exploring the effectiveness of AI-constructed measures with the help of novel GeoAI deep neural network techniques, namely, spatial-interaction aware multi-category deep neural network (SAMCNet), to go beyond the hand-constructed features. Cellular maps derived from multiplex immunofluorescence (MxIF) imagery contain over 60 different cell types, resulting in a million trillion (260) potential spatial co-location patterns to explore. Thus, MSI resources are used to help to construct input to SAMCNet, which itself contains millions of parameters required to be trained. MSI resources are needed to refine SAMCNet by considering spatial variability and training the model across different regions (e.g., tumor-core and tumor-interface) that result in learning million to billion parameters.
-
Image classification of weeds and non-weeds and identification of the weed locations in turfgrass using deep-learning models. Precision weed detection will help maintain sustainable and functional landscapes while significantly reducing environmental impact. The challenges arise due to a high dimensional parameter space in the image-based deep-learning models. In addition, conventional deep learning techniques mainly use a single GPU to speed up the training and inference times but may not be scaled up for real-time data processing. Thus, MSI resources are being used to compare different state-of-the-art deep learning (e.g., RCNN, Faster-RCNN, YOLO, ResNet) and to benefit from a multiple GPUs platform and a parallel computing system to improve real-time data processing.
-
Mining engine data and trajectories to find environmentally-friendly paths. Given a road network, an origin, a destination, and onboard diagnostics (OBD) data, the energy-efficient path selection problem aims to find the path with the least expected energy consumption. Taking energy-efficient routes instead of the fastest route can help avoid over one million metric tons of carbon emissions every year. To find the energy-efficient path, the researchers will first leverage MSI resources to perform the data preprocessing task to process large volumes of truck OBD data provided by Volvo (e.g. map matching, data cleaning, etc.), and to train a novel physics-informed neural network for energy estimation on road segments in subpaths to take the contextual information into consideration. A city road network (e.g. 70 miles around Minneapolis) could contain hundreds of thousands of road segments and tens of millions of subpaths, so both the time cost and storage cost of calculating the energy-efficient paths in a city road network are high, necessitating the use of MSI resources. This work will be leveraged further to find environmentally-friendly paths for electric vehicles (EVs) to reduce the carbon footprint. The carbon emission data set will be retrieved from WattTime API, which collects and aggregates historic nationwide emission information. This problem is computationally intensive and storage intensive because of the spatial-temporal variability within the carbon emission rate of EV charging stations and the large number of candidate paths in a road network. For example, for the worst-case scenario where the network graph is complete, each permutation of intersections except the origin and the destination is a potential path (e.g. given a complete network graph with only 20 intersections, there are 6*1015 candidate paths between an origin-destination pair).
-
Detecting statistically significant regional colocation patterns. This problem has applications in ecology, economics, sociology, etc. In any relevant spatial dataset, there can be an exponential number of candidate patterns and an exponential number of candidate regions which makes algorithms such as colocation pattern mining very expensive. For example, the Safegraph POI dataset has locations of 1,473 different retail brands which can result in 21,473 potential colocation patterns. Adding statistical significance to the colocation detection process ensures that the identified patterns did not occur by chance. Significance testing adds extra computational cost to the algorithm since it is necessary to generate hundreds of null hypothesis datasets for each participating feature to model their distribution under complete spatial randomness, making it necessary to use resources that can handle such intensive computations.
-
Finding spatial patterns (e.g., spatiotemporal hotspots) in the ice sheet melts in polar regions. The melt in the polar ice sheets can lead to a rise in the sea level and affect livelihood in different parts of the earth. For this project, the researchers plan to utilize different spatiotemporal and physics-informed models to predict how quickly the ice sheets would shrink. This project involves analyzing massive amounts of datasets with models that require learning from millions to potentially billions of parameters. Thus the project requirement includes both large storage and computational resources.