figshare
Browse
drcode.zip (14.21 kB)

Divide and Recombine Approaches for Fitting Smoothing Spline Models with Large Datasets

Download (14.21 kB)
dataset
posted on 2017-11-27, 15:28 authored by Danqing Xu, Yuedong Wang

Spline smoothing is a widely used nonparametric method that allows data to speak for themselves. Due to its complexity and flexibility, fitting smoothing spline models is usually computationally intensive which may become prohibitive with large datasets. To overcome memory and CPU limitations, we propose four divide and recombine (D&R) approaches for fitting cubic splines with large datasets. We consider two approaches to divide the data: random and sequential. For each approach of division, we consider two approaches to recombine. These D&R approaches are implemented in parallel without communication. Extensive simulations show that these D&R approaches are scalable and have comparable performance as the method that uses the whole data. The sequential D&R approaches are spatially adaptive which lead to better performance than the method that uses the whole data when the underlying function is spatially inhomogeneous.

Funding

This research was supported by a grant from the National Science Foundation (DMS-1507620). The authors acknowledge support from the Center for Scientific Computing from the CNSI, MRL: an NSF MRSEC (DMR-1121053).

History