Project abstract for group shekhars

Scalable Ring-Shaped Hotspot Detection

Given a collection of geo-located activities (e.g., crime reports), ring-shaped hotspot detection (RHD) finds rings where the concentration of activities inside the ring is much higher than outside. RHD is important for the applications such as crime analysis, where it may focus the search for crime source’s location, e.g. the home of a serial criminal. RHD is challenging because of the large number of candidate rings and the high computational cost of the statistical significance test.

This group's previous work considereda Dual Grid Based Pruning (DGP) algorithm to detect ring-shaped hotspots. Based on theoretical proofs and experimental evaluation, the DGP algorithm improves the computational cost of the naïve approach substantially. However, the DGP algorithm is still computationally intensive due to the extensive ring enumeration using a four dimensional parameter space (i.e. four parameters of a ring) and the requirement for a significance test using Monte Carlo simulations which increase its computational cost. Moreover, sequential procedures that are used for ring enumeration in parameter space do not scale up to large number of grid cells due to the four dimensional nature of the parameter grid. Therefore the researchers plan to improve the computational efficiency of the DGP algorithm using an enhanced refine phase (by applying the Best Enclosing Ring algorithm) and an enhanced pruning phase (using multi-cell-size pruning and Local Maxima Enumeration).

The researchers are using MSI for two tasks:

  • To perform an extensive experimental evaluation for the DGP algorithm, which can help in executing the algorithm for a large number of grid cells and a large number of Monte Carlo simulations in a reasonable amount of time. The researchers will extensively evaluate the performance of the DGP algorithm as compared to the naive approach. Experiments will evaluate the effect of varying multiple parameters (i.e. cell size, number of activities, likelihood ratio threshold, number of Monte Carlo simulations) on the algorithm performance.
  • To develop a parallel formulation of the DGP algorithm using a parallel computing platform (e.g. Hadoop, Spark, GPU, etc). The researchers will parallelize the DGP algorithm over a set of machines using a parallel computing platform (e.g. Hadoop, Spark, GPU, etc) to meet the high performance requirements imposed. They will both use task partitioning and data partitioning among the machines and the goal of the parallelization will be to improve the scalability as well as reduce the execution time of DGP. In data partitioning schemes, the four dimensional parametric grid will be partitioned among different machines and each parametric grid cell will be independently processed. In the task partitioning schemes, the significance test with Monte Carlo simulations will be given to different machines so that each machine completes a number of simulations.

Return to this PI's main page.