Jay Jain

In recent years, the volume and availability of biodiversity data have grown greatly due to major digitization efforts and the emergence of citizen science. Despite the abundance of datasets available to ecologists, most biodiversity and species trait datasets have many missing values due to incomplete or inadequate sampling of organisms across species and spatial locations. Filling in missing data values is essential for ecologists to analyze biodiversity in the age of climate change, thus we propose a model that utilizes phylogenetic relatedness and spatial location to predict missing values in species trait datasets. Statistical methods have been developed to impute missing values using phylogenetic relationships between species with unknown trait values and species where trait values are known. Additionally, methods use the spatial location of neighboring records to impute unknown values. However, no method to date combines both approaches to minimize uncertainty using (1) covariance with known trait values from the same species, (2) phylogenetically related species, and (3) relationships among traits of organisms that are close together in space. We developed a Bayesian method of imputation that incorporates phylogenetic and spatial sources of uncertainty. To assess the performance of our method, we used our imputation method to predict known data values using a dataset of functional traits of North American forest tree species, and we assessed the accuracy of the imputed values using confidence intervals. The method utilizing both genetic and spatial information is expected to perform better than when phylogenetic and spatial methods are implemented in isolation.