GraSPy: an Open Source Python Package for Statistical Connectomics
Recent advances in imaging and computational infrastructure are enabling neuroscientists to collect connectomics datasets at unprecedented scale and detail. As these data sets become available, robust and easy-to-use statistical analysis techniques are becoming more and more important to accelerate discovery. Connectome data inherently represents a set of objects (neurons, brain regions) and the relationship between them (synapses, projections, correlations). Graphs are a convenient mathematical description of this data, where the objects are nodes and their relationships are edges. As such, statistical tools specifically designed for analysis of graph data are required in order to answer scientific questions about neuroanatomical connections. However, classical statistical assumptions about independence are violated when considering graph-value data. Additionally, many standard data analysis techniques neglect the topological organization of the network when applied to graphs. Researchers have developed statistical techniques and theoretical guarantees for analysis of graph data that overcome these challenges. However, these algorithms were not available in a unified, accessible implementation that could be used by the broader scientific and connectomics community. We developed GraSPy, an open-source Python toolkit for statistical inference on graphs. GraSPy builds on Python’s existing graph and machine learning ecosystem by accepting input from NetworkX and complying with the scikit-learn API. The package provides functionality for low-dimensional embeddings of graphs, statistical testing on individual or sets of graphs, simulations for several random graph models, as well as various plotting and utility functions for graph manipulation. We demonstrate GraSPy’s utility to the connectomics community by applying GraSPy to a statistical comparison of the bilateral homology of a newly updated C. elegans connectome. GraSPy will enable neuroscientists to take advantage of the rich structure in connectomics data while making statistical claims about neural systems.