Alex McKim

Various statistical models for spatial data rely on some form of a nearest neighbor calculation among observed spatial locations. A brute force solution to a nearest neighbor calculation is easy to implement, but is computationally impractical for large data sets. Various data tree structures, e.g., k-d trees, have been proposed to improve the computation efficiency of nearest neighbor searches. Our focus is on efficient implementation of a statistical model called the Nearest Neighbor Gaussian Process (NNGP) that involves nearest neighbor searches for massive spatial data sets. We developed a specialized k-d tree structure and search algorithm designed to work with the NNGPs model assumptions. We compare the search time of our proposed k-d tree to that of a brute force nearest neighbor search under different parallel computing settings and data sizes.