Fast Nonseparable Gaussian Stochastic Process With Application to Methylation Level Interpolation
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Gaussian stochastic process (GaSP) has been widely used as a prior over functions due to its flexibility and tractability in modeling. However, the computational cost in evaluating the likelihood is , where n is the number of observed points in the process, as it requires to invert the covariance matrix. This bottleneck prevents GaSP being widely used in large-scale data. We propose a general class of nonseparable GaSP models for multiple functional observations with a fast and exact algorithm, in which the computation is linear (O(n)) and exact, requiring no approximation to compute the likelihood. We show that the commonly used linear regression and separable models are special cases of the proposed nonseparable GaSP model. Through the study of an epigenetic application, the proposed nonseparable GaSP model can accurately predict the genome-wide DNA methylation levels and compares favorably to alternative methods, such as linear regression, random forest, and localized Kriging method. The supplementary materials of this article are online and the algorithm for fast computation is implemented in the FastGaSP R package on CRAN. Supplemental materials for this article are available online.