Crop Yield Prediction Using Bayesian Spatially Varying Coefficient Models with Functional Predictors

Abstract Reliable prediction for crop yield is crucial for economic planning, food security monitoring, and agricultural risk management. This study aims to develop a crop yield forecasting model at large spatial scales using meteorological variables closely related to crop growth. The influence of climate patterns on agricultural productivity can be spatially inhomogeneous due to local soil and environmental conditions. We propose a Bayesian spatially varying functional model (BSVFM) to predict county-level corn yield for five Midwestern states, based on annual precipitation and daily maximum and minimum temperature trajectories modeled as multivariate functional predictors. The proposed model accommodates spatial correlation and measurement errors of functional predictors, and respects the spatially heterogeneous relationship between the response and associated predictors by allowing the functional coefficients to vary over space. The model also incorporates a Bayesian variable selection device to further expand its capacity to accommodate spatial heterogeneity. The proposed method is demonstrated to outperform other highly competitive methods in corn yield prediction, owing to the flexibility of allowing spatial heterogeneity with spatially varying coefficients in our model. Our study provides further insights into understanding the impact of climate change on crop yield. Supplementary materials for this article are available online.


Introduction
Crop yield prediction is a fundamental problem in Agriculture and Economics. In the United States, agriculture is a major industry by contributing 136.1 billion dollars to the national gross domestic product (GDP) and providing 11% of total employment in 2019 (USDA 2020b). The United States is also a vital contributor to global food security as a leading international producer of agricultural products. The American Midwest is especially one of the world's largest crop production areas with over 127 million acres of agricultural land and produces 85% of U.S. corn and soybean (USDA 2020a). Specifically, corn is a primary grain crop globally, widely used for human food, livestock feed, and biofuel (ethanol production). Timely and accurate crop yield prediction in the Midwest is crucial for policy-makers and planners to create appropriate strategies for the storage, distribution, and trade of products for agricultural risk management.
We aim to develop a crop yield forecasting model at large spatial scales to predict the corn yield for five Midwestern states. In this region, where the well-known "Corn Belt" lies, soils are deep, fertile, and rich in organic material and nitrogen, and the land is relatively level. The warm nights, hot days, and welldistributed rainfall of the region during the growing season are ideal conditions for raising corn. It is well known that climate variables, particularly temperature and precipitation, have a significant impact on all stages of plant growth and hence affects agricultural productivity. Ray et al. (2015) estimated that 60% of corn yield variability and 36% of the soybean yield variability in the American Midwest can be explained by climate variability. Various predictive models have been developed by crop scientists for production monitoring, including process-based simulation models under comprehensive mechanism processes (Guan et al. 2017;Jones et al. 2017;Peng et al. 2018), empirical statistical crop models based on historical climate observations and yield measurements (Prasad et al. 2006;Gornott and Wechsung 2016;Kern et al. 2018), and machine learning methods in recent studies (Kang et al. 2020;van Klompenburg, Kassahun, and Catal 2020). These approaches typically use a summary of climate information, such as annual or monthly aggregated climate data, rather than the complete trajectories of meteorological variables in their prediction models.
Since measurements of meteorological variables, such as maximum and minimum temperatures, are available on a daily basis and their effects on crop yield vary at a different growing stage of the crop, it is natural to treat the trajectories of these variables as functional data in the predictive model for crop yield. Indeed, recent studies have shown that more accurate crop yield prediction can be obtained by using the complete temperature trajectories as functional predictors (Wong, Li, and Zhu 2019). On the other hand, estimating the potential impacts of weather patterns on crop productivity is of widespread interest to agriculture and the economy as climate change becomes an imminent challenge for humanity (Lobell and Burke 2010;Holzkämper, Calanca, and Fuhrer 2012). Both aspects suggest investigating the relationship between crop yield and meteorology trajectories, for which functional regression models are a suitable tool.
The most widely used regression models for scalar responses against functional covariates are the functional linear models (Müller and Stadtmüller 2005;James, Wang, and Zhu 2009;Li, Wang, and Carroll 2010;Goldsmith et al. 2011;Zhao, Chen, and Ogden 2015). There has been some recent work on nonlinear regression models for independent functional data, including the functional additive models (Müller and Yao 2008;Zhu, Yao, and Zhang 2014;Fan, James, and Radchenko 2015), the functional generalized additive models (McLean et al. 2014), and the partially linear functional additive model (Wong, Li, and Zhu 2019;Cui, Lin, and Lian 2020). In addition, mixed models have been developed to accommodate the dependency of functional covariates collected in groups or repeatedly over time (Crainiceanu, Staicu, and Di 2009;Goldsmith et al. 2012). See Reiss et al. (2017) for a thorough review for scalar-onfunction regression methods.
Although different approaches have been proposed to analyze functional data observed over a spatial domain and model the spatial correlation between curves (Zhou et al. 2010;Staicu, Crainiceanu, and Carroll 2010;Zhang et al. 2016;Kuenzer, Hörmann, and Kokoszka 2021), to our best knowledge, spatial correlation of functional covariates is rarely considered in functional regression models. This limits the applicability of functional regression models to prediction problems where the functional covariates are collected over a spatial domain. As detailed in Section 2, our functional covariates for corn yield prediction consist of county-level temperature trajectories which are known to be spatially correlated (North, Wang, and Genton 2011;Di Cecco and Gouhier 2018). Furthermore, it is no longer realistic to assume a homogeneous relationship between response variables and predictors when data are observed over a large spatial domain. In our study of corn yield prediction using meteorological variables for five Midwest states, the effect of temperatures on crop yield may vary across different regions depending on the local environmental conditions, such as soil moisture, soil PH, solar radiation, and wind velocity. Since continuously monitoring all confounding factors in a large spatial region is not feasible, it is essential for the crop yield model to allow for spatial heterogeneity.
We propose a spatially varying functional regression model to incorporate spatial correlation between functional covariates and also allow the functional coefficients to vary over space. Our model can be considered an extension of the spatially varying coefficient model of Gelfand et al. (2003) to functional regressions. There are two main statistical challenges for this extension: low-rank representation of spatially dependent functional predictors and regularization to avoid over-fitting while respecting the heterogeneous relationship between crop yield and meteorological variables. To address the first challenge, we develop a fully Bayesian functional principal component analysis (FPCA) framework for spatially dependent functional data. The within-curve dependence is captured by a low-rank FPC representation based on nonparametric basis functions.
Some existing FPCA methods such as James, Hastie, and Sugar (2000), Zhou, Huang, and Carroll (2008), Zhou et al. (2010), andHuang, Li, andGuan (2014) used reduced rank basis function representations fitted via Expectation-Maximization algorithms under frequentist paradigms. In contrast, we propose to embed FPCA estimation into a fully Bayesian hierarchical model to ensure uncertainties associated with FPCA estimation propagate into crop yield prediction. We accommodate the betweencurve spatial dependence by allowing FPC scores to be spatially correlated. Over-parameterization of our model is likely to occur if insignificant FPC scores are used in the regression. To remedy this potential issue while respecting the possible spatial heterogeneity, we introduce a build-in Bayesian spatially varying variable selection device to ensure model parsimony. We also govern all spatially varying parameters by correlated processes to further reduce the risk of over-fitting.
The rest of the article is organized as follows. We introduce the corn yield and meteorology data in Section 2 which also includes a preliminary analysis to explore the heterogeneous relationship between the corn yield and the functional meteorology predictors. We describe the proposed spatially varying coefficient functional regression model with a fully Bayesian hierarchical structure in Section 3 and conduct simulations to examine its prediction performance in Section 4. We present a comprehensive study of the corn yield prediction in Section 5 and conclude the article with discussions in Section 6. Detailed MCMC algorithm, convergence diagnostics, additional simulation and data analysis results are relegated to the supplementary materials.

Crop and Weather Data
We obtain the county-level annual crop yield data (measured in bushels per acre) together with the size of harvested land (acre) of Illinois, Indiana, Iowa, Kansas, and Missouri between 1999 and 2020 from the National Agricultural Statistics Agency (https://quickstats.nass.usda.gov/). We will focus on the corn yield, which is the most important crop in this region. The corn yield data availability at each county varies over time. A total of 403 counties, including 79 in Illinois, 66 in Indiana, 93 in Iowa, 92 in Kansas, and 73 in Missouri, have data available for at least five years between 1999 and 2020, among the 102, 92, 99, 105, and 114 counties from their respective states. The median number of years of available corn yield record for those 403 counties is 17 out of 22, and our analysis will be performed on those counties. The harvest land size also varies over the years and is collected from the annual Agricultural Survey. In our data, the harvest land size ranges from 200 to 394,000 acres with a median size of 70,300 acres.
The meteorology measurements for each county, including annual precipitation, daily maximum and minimum temperature are obtained from the National Climatic Data Center (NCDC, https://www.ncdc.noaa.gov/data-access). Figure 1 shows temperature trajectories collected from counties in Iowa and Kansas as examples. Different temperature characteristics are observed in different regions. We model the annual precipitation as a scalar covariate and the daily maximum and minimum temperature trajectories as spatially dependent bivari-  ate functional predictors. While the county-level agricultural data tend to have missing values on their historical records, the daily county-level temperature is available from the early 1900s for most areas of the United States from the NCDC. The long climate records allow us to reliably assess the long-term temperature pattern in each county.
A preliminary data exploration demonstrates heterogeneous relationships between corn yield and temperature trajectories. We treat the daily maximum and minimum temperature curves as multivariate functional data, conduct a multivariate FPCA (Wong, Li, and Zhu 2019), and fit a linear regression model on corn yield with the first and second FPC scores for each county independently, using different years as replicates. Figure 2 shows the estimated coefficients for all counties and coefficients exhibit a spatially clustered pattern. We then examine the spatial correlation of the coefficients by performing a Moran's I test to test the null hypothesis of no spatial correlation versus a onesided alternative of positive spatial correlation. The resulting p-values < 0.001 for the regression coefficients on both the first and second FPC scores provide strong evidence of spatial correlation.

Bayesian Spatially Varying Functional Model
For ease of exposition, we present the model based on singleyear data. Our analysis treats data from multiple years as conditionally independent replicates, with rationals detailed in Section 5.1. Let Y(s) denote the scalar response at location s ∈ D for a spatial region D ⊂ R 2 , X(s; t) = {X 1 (s; t) . . . , X q (s; t)} T defined for t ∈ T denote q functional predictors, and Z(s) = {Z 1 (s), . . . , Z h (s)} T denote d scalar predictors associated with Y(s). In our data, Y(s) is the average corn yield per acre for the county located at s, D is the spatial region of the five Midwestern states, and the time domain T is a year. We also have q = 2 and h = 1 with X 1 (s; t) and X 2 (s; t) being daily maximum and minimum temperature trajectories and Z(s) the annual precipitation collected from county s. Without loss of generality, we rescale the time domain to a unit interval and thus T = [0, 1]. We consider spatially varying relationship between the scalar responses and associated predictors, and propose a general framework for spatially varying functional model: where η 0 (s) is a location-specific intercept, α(s) = {α 1 (s), . . . , α h (s)} T is a vector of coefficients for the scalar predictors, and η(s; t) = {η 1 (s; t), . . . , η q (s; t)} T denotes a vector of functional coefficients over t ∈ T modeling the effect of functional trajectories on scalar response at s. As a special case, when scalar and functional coefficients are spatially invariant, that is, α(s) = α and η(s; t) = η(t), we obtain a partial functional linear model (Shin 2009). Because the spatially varying coefficients are flexible and expected to capture the spatial variability in Y(s), imposing additional spatial correlation structure on the error e(s) will cause identifiability issues to the model. Following Gelfand et al. (2003), we assume that e(s) are zero-mean independent Gaussian errors. Since Y(s) is the average crop yield per acre obtained from an agricultural survey, we assume var{e(s)} = σ 2 e /ω(s) following standard practice for survey data (USDA 2012), where σ 2 e is an unknown variance parameter and ω(s) is the size of the harvest land.
The direct estimation of model (1) suffers from the curse of dimensionality, we thus need a low-dimensional representation of spatially dependent multivariate functional predictors. This dimension reduction in the functional model should fulfill the purposes of preserving the within-curve temporal correlation as well as the spatial correlation across s. To accomplish this, we first define the mean function of the multivariate functional predictors as μ( scores that are spatially correlated. Here, p determines the rank of FPC representation and will be determined by data-driven methods. Using the same PC functions, the spatially varying functional coefficient in (1) can be expanded as η(

Spatial Functional Predictors with Measurement Errors
In practice, X(s; t) is often not directly observable, and instead, we only observe its surrogate containing measurement errors. It is well known that climate data products, especially the high-resolution data, are error prone for various reasons (e.g., Matthews, Mannshardt, and Gremaud 2013;Merchant et al. 2017). Since our functional covariates are daily maximum and minimum temperatures, they are likely contaminated with errors. The error prone functional predictors may lead to biased inference when directly used in the prediction model, we therefore should take into account measurement errors in the derivation of the low-dimensional representation of multivariate functional predictors. We then define W(s; t) = {W 1 (s; t), . . . , W q (s; t)} T to represent the functional covariates containing additive measure errors, where u(s; t) = {u 1 (s; t), . . . , u q (s; t)} T is a vector of white noise with variance parameter σ 2 u l for the lth element. Our meteorology records have a much longer history than the crop yield measurement. This allows us to estimate the local long-term temperature pattern μ(s; t) using historical records dated half a century back. The estimation errors in mean functions are negligible compared with the year-to-year variations. We therefore center each temperature curve by its long-term mean and expand the centered W(s; t) as We assume that u(s; t) and e(s) in (1) are independent.
Let s 1 , . . . , s n be the observed locations and ξ r = {ξ r (s 1 ), . . . .ξ r (s n )} T be the n-dimensional vector of the rth order FPC scores. To model spatial dependency among functional predictors, we assume ξ r ∼ N{0 n , σ 2 ξ r (θ ξ r )}, r = 1, . . . , p, where (θ ξ r ) denotes an n×n correlation matrix parameterized by θ ξ r . This regularization on ξ r helps preserve the spatial continuity of functional predictors, and more details on its correlation structure are elaborated in Section 3.3. We also assume that ξ r are independent across r.
In the classic functional linear model literature, the PC functions, f r (t), are often estimated by spectral decomposition of the empirical covariance function and held as if known in the regression analysis. Such an approach ignores the estimation error of the empirical principal components, and can lead to underestimated model variation and spurious statistical inference. We instead propose to combine FPC estimation and functional regression in a unified Bayesian framework. Specifically, we represent f r by a penalized spline expansion similar to Zhou, Huang, and Carroll (2008), Zhou et al. (2010), and Huang, Li, and Guan (2014), and impose a Bayesian FPCA model similar to Kowal, Matteson, and Ruppert (2017). We first express each f r as rl being the vector of L unknown coefficients for f rl (t). We choose cubic B-splines with equally spaced interior knots for ψ B (t) in our application but other basis functions can also be employed. To penalize the roughness of f r (t), a penalty function P as the L 2 -norm of the second derivative of f r is constructed, that is, Under the Bayesian spline approach, roughness penalty is imposed by constructing a prior on d B rl based on P. Following Wand and Ormerod (2008), we use a reparameterized d B rl , denoted by d rl , to make the penalty matrix diagonal for convenient computation. Specifically, let ψ = U D U T be the singular value decomposition of ψ . We reparameterize the basis into t)d rl and the roughness penalty for d rl can be written as a diagonal matrix with smoothing parameter λ r > 0. We constrain λ 1 > λ 2 > · · · > λ p > 0 following Kowal, Matteson, and Ruppert (2017) to order f 1 (t), . . . , f p (t) by decreasing smoothness. Then we construct a prior d rl ∼ N{0 L , diag(c, c, λ −1 r , . . . , λ −1 r )} with a sufficiently large constant c > 0, and also introduce orthogonal constraints, d T rl J ψ d r l = 0, for r = r , where J ψ = ψ(t)ψ T (t)dt. The expansion of W(s; t) and the Bayesian spatial FPCA to estimate f r (t) will be integrated into the full hierarchical models in Section 3.4.

Spatial Variable Selection
The model selection in the functional regression model is typically performed via truncation, which first determines a p based on how much total variation in X or W can be explained by its truncated expansion and then uses the same first p FPC scores in the regression model for all observations (Morris 2015). This selection procedure may not be ideal for data observed over a large spatial domain with a heterogeneous relationship between the response and functional covariates. For example, a specific rth FPC score may have nonzero coefficients at some locations while having zero coefficients elsewhere, depending on their local geological or ecological attributes. A more flexible model selection scheme that allows heterogeneous sets of FPC scores to be selected at different locations while preserving spatial continuity is desirable for our data. We, therefore, propose a Bayesian spatial variable selection method that can make location-specific decisions.
To facilitate the spatial variable selection, we introduce a binary indicator variable γ r (s) such that β r (s) = 0 if γ r (s) = 0, and β r (s) = 0 if γ r (s) = 1. Following Mendoza (2018), we then introduce a spatially correlated latent process v r (s) to ensure spatial continuity of γ r (s) by having The choice of p associated with our Bayesian spatial variable selection will be discussed in Section 3.4. A similar variable selection scheme can be easily made for Z(s) when h is high, but it is not considered in our analysis as h = 1 in our data.

Regularization of Spatially Varying Coefficients
Our proposed model employs spatially varying parameters to accommodate the spatially heterogeneous data structure. The varying coefficients β r (s) and α j (s) model the location specific relationship between the crop yield and predictors, the FPC scores ξ r (s) represent the individual functional predictors for each location, and the binary coefficients γ r (s) further allows idiosyncratic response-predictor relationships for different locations. Allowing the coefficients to vary with no constraint could potentially lead to over-fitting or algorithm failure. Following Gelfand et al. (2003), we regularize all the spatially varying parameters by Gaussian random processes. The imposed spatial correlation in the Gaussian process priors can force neighboring coefficients to behave similarly, and hence mitigate over-fitting. Specifically, we model the spatial coefficients β r (s) and α j (s) as spatially correlated Gaussian processes, where the correlation decays as the locations become further apart. For example, we assume cov{β r (s is a correlation function defined on the distance between locations, s i −s i , and a parameter vector θ β r . Different correlation functions can be employed for different spatial coefficients. Similarly, we assume the latent variable ξ r (s) and the latent process v r (s) for γ r (s) all governed by Gaussian processes with their own covariance structures.
Although our data are aggregated at the county level so the conditional autoregressive (CAR) model traditionally constructed for lattice data may apply, we argue a covariance func-tion for continuous geostatistical data is a reasonable approximation for modeling the dependency structure for the crop yield and climate data. The CAR model usually treats regional observations sharing a border as neighbors and then assumes a certain correlation structure for neighbors. This aligns well with the inherent characteristics of certain datasets, such as the epidemiology data for which the spread of disease is often due to the interaction of humans or animals of adjacent regions sharing a border. However, the dependency of the crop yield data mainly comes from similar soil properties, weather, and environmental conditions when the two observations are close in geographic distance, rather than two counties sharing boundaries. Therefore, we choose a covariance function that depends on the distance to model the correlation. Our model aligns with arguments in Oyebamiji et al. (2015) and Park, Brorsen, and Harri (2018) who chose geostatistical covariance models to quantify the spatial dependency of areal crop yield data. The specific choice of covariance function for our data will be discussed in Section 5.1.

Bayesian Hierarchical Model
Due to the hierarchical nature of our model configuration, we resort to the Bayesian hierarchical model (BHM) for model fitting and statistical inference. The BHM typically consists of three stages. The data stage contains the likelihood of data given the unknown parameters and processes, while the process stage models the latent processes. The third level closes the hierarchy by specifying the priors of the unknown parameters. Below we summarize the first two stages of our model and defer the priors to the supplementary materials. (i) Data stage: where e(s i ) and u(s i ; t) are independent, and γ r (s i ) is defined in (3). The positive weight ω(s i ) is known at s i . The ψ(t) and d r are reparameterized basis functions and their corresponding coefficients for the expansion of f r , as described in Section 3.1.
(ii) Process stage: The L × L roughness penalty matrix D r in (6), placed on d r , controls the smoothness of basis functions with a sufficiently large constant c and smoothing parameters λ 1 > · · · > λ p . Ordered smoothing parameters are estimated through MCMC implementation and details can be found in the supplementary materials. The n × n correlation matrices (θ α j ), (θ β r ), (θ ξ r ), and (θ v r ) govern the dependency of each process, and are calculated for n spatial locations based on their corresponding correlation functions, as discussed in Section 3.3.
Most location and variance parameters are sampled through Gibbs due to their explicit posterior distributions, while the rest parameters, including parameters in correlation functions and latent indicator processes v r (s) are sampled through Metropolis-Hastings (M-H). The full conditional distributions of unknown parameters and the entire sampling algorithm are also deferred to the supplementary materials. Whenever the M-H algorithm is used, the acceptance rate is tuned to be between 25% and 50% to secure adequate mixing of posterior samples. Both simulations and data analysis results show that the results are robust against the choice of initial values.
The algorithm requires a specification of the truncation parameter p in the FPC expansion. We determine the optimal p via the Deviance Information Criterion (DIC) (Spiegelhalter et al. 2002), which is a generalization of the Akaike's Information Criterion (AIC) for hierarchical models. This requires to first measure the goodness of fit of the model, D( ), based on the deviance statistic given parameters . For our model, we define denotes the likelihood function, and 1 and 2 denote parameters associated with the spatial functional regression and FPCA expansion, respectively. Next, the model complexity is measured to trade off model fit against the number of free parameters. Since the number of parameters is not clearly defined in complex hierarchical models, the penalty term measuring the model complexity is defined through P D , known as the "effective number of parameters" in Spiegelhalter et al. (2002). It can be approximated as the difference between the posterior mean of the deviance and the deviance at the posterior estimates of the parameters. Based on this approximation, DIC is calculated as DIC = 2D − D(¯ ), whereD denotes the posterior mean of D( ) and¯ the posterior estimates of . We then choose the optimal p as the value that minimizes the DIC.

Prediction
Given the hierarchical models, we simultaneously estimate model parameters, perform the spatial variable selection, and make predictions for new observations. To carry out such analysis, we incorporating new functional and scalar predictors, W new , Z new , and unknown response variables, Y new , into the BHM. The model treats Y new as unknown variables and draws samples from a full conditional distribution at each iteration of the MCMC algorithm. By doing so, the new data participate in the estimation of PC functions and spatial variable selection. Finally, the prediction, Y new , and their corresponding posterior 100(1 − α)% credible intervals are obtained from the posterior means, and the 100(α/2)th and 100(1 − α/2)th percentiles of the posterior samples, respectively.

Data Generation and Implementation
We conduct simulations to evaluate the numerical performance of our method in terms of model fitting and prediction and make comparisons with competing methods. We adopt the spatial locations of the 403 counties across five states in our real data as the spatial domain. The distances between those counties range from 16 to 1530 km, with an average of 516 km. We set the number of years as five, which represents a more challenging scenario in terms of estimation accuracy than the real data analysis due to the smaller sample size. We assume data at different years are conditionally independent given the predictors (see Section 5.1) and generate five replicates of spatial responses based on the following model: where k and i are indices for years and counties, respectively, (0, 3), and all the other vectors of spatially varying predictors and coefficients are generated from a Gaussian process with their own mean and covariance matrix parameterized by φ.
is the distance between two observations, κ is the smoothness parameter, φ the range parameter, and K κ (·) is the modified Bessel function of the second kind (Stein 1999). The parameter φ controls the rate of decay, with larger values of φ corresponding to a slower decay and thus stronger correlation. To investigate the effect of spatial correlation on prediction, we consider three values of φ: 50, 200, and 350, which correspond to low, moderate and high correlation of our simulated data. We set the smoothness parameter κ = 1. These settings are similar to Staicu, Crainiceanu, and Carroll (2010).
We further generate multivariate functional predictors W k (s i ; t) as follows, The simulation is repeated 100 times. Each time we randomly selected 20% of observations as the testing data and the rest as training data. Although our data are generated from a Matérn covariance structure, we choose to use an exponential covariance function in our model fitting for simplicity, which corresponds to a Matérn with κ = 0.5. This can also evaluate the robustness of prediction using a misspecified exponential covariance function, which has advantages in computation and estimation. More discussions on the choice of covariance function will be provided in Section 5.1. Each simulation samples 5000 times with its first 2500 as burn-in. In addition to making a prediction using our own Bayesian Spatially Varying Functional Model (BSVFM), we also try the following three models for comparison. All three comparison models include the parametric effect of a scalar covariate but employ different approaches to incorporate the effect of bivariate functional predictors. To fit PLFAM and FLM, mFPCA scores are calculated based on the estimated cross-covariance function of bivariate functional trajectories by applying the algorithm in Wong, Li, and Zhu (2019). We fix p = 2, the actual value of p, for all FPCAbased methods including ours, FLM and PLFAM in the simulation. On average, the top two empirical FPCs explain 99.3% of the total variation in the functional predictors. We have also investigated the PLFAM and FLM using the FPCs that capture 99.9% of the total variation, which leads to p = 11 on average. We found no advantage of using a more extensive set of FPCs in those models, and thus only defer those results to the supplementary materials. For FGAM, the model is fitted using the R package "refund, " where the tuning parameter is determined by a built-in generalized cross-validation procedure. To fit our proposed model, we employ 20 cubic B-splines bases to represent the PC functions.

Evaluation of Prediction
We use Mean Squared Prediction Errors (MSPE) calculated from the testing set to measure the prediction performance of different methods. Figure 3(a)-(c) display the prediction results associated with the three values of the range parameter φ. Note that we observe much larger MSPE compared to the noise level of σ 2 e = 4 in our data generation model (7). This is because the spatially varying coefficients add a substantial amount of variability to the responses so the total variation in Y is much larger than 4. In our simulation, the average marginal variances of the responses are 55.2, 51.3, and 46.8, for φ = 50, 200, and 350, respectively. A stronger correlation associated with a larger φ leads to less variability in responses. The challenge of estimating spatially varying coefficients increases the prediction error. The measurement errors in the functional covariates additionally contribute to the uncertainty of the estimation and prediction. For all three levels of spatial correlation, our proposed BSVFM consistently outperforms the competing models with a lower MSPE. When φ is small, the spatial correlation of the varying parameters is weak. This leaves the parameters more degrees of freedom to vary and thus more volatile to estimate. We therefore observe relatively higher MSPEs of all methods for a smaller φ. In all scenarios, our BSVFM demonstrates a significant advantage over other methods by allowing for spatially heterogeneous parameters. Furthermore, our BSVFM naturally provides prediction intervals which is challenging for the other three models. We then examine the accuracy of the prediction intervals derived from the BSVFM model. Based on the 100 simulation runs, the empirical coverage probabilities for the 95 % credible intervals under φ = 50, 200, and 350 are 98.10%, 98.34%, and 98.38%, respectively, which are close to, albeit slightly larger than, the nominal level.
Ignoring the spatially varying feature of the parameters can also lead to clustered residuals, which is an indicator for regionally biased model fitting and prediction. To illustrate this, we performed Moran's I test on residuals from different models, and Table 1 summarizes the percentages of rejecting the null hypothesis of no spatial correlation. A lower rejection rate implies a more random pattern of residuals and thus a more appropriate model with spatial variability properly incorporated. We find a much lower reject rate for our model residuals when data has moderate to high spatial correlation, that is, φ = 200 and 350. The data with φ = 50 is very weakly correlated, so all model residuals show relatively small rejection rates. A uniformly decreasing rejection rate for φ = 350 compared to φ = 200 is due to the very strong correlation making the spatial data less varying. Figure 4 illustrates an example of residuals from four models over Iowa State under φ = 200. Residuals from our BSVFM show milder clusters compared to other competing models.

Model Specifics
We apply the proposed BSVFM to predict the county-level averaged corn yield in the Midwest region using the annual precipitation and daily temperature trajectories as introduced in Section 2. Let Y k (s i ) be the average corn yield per acre and Z k (s i ) be the annual precipitation at county s i and year k, for i = 1, . . . , 403, and k = 1999, . . . , 2020 and W k2 (s i ; t) are the centered daily maximum and minimum temperature trajectories in year k, respectively. The sustained improvements in seed genetics and biotechnologies have drastically affected the yearto-year independent variation of crop yield in the past several decades. The periodical land fallow or land rotation additionally reduces the temporal correlation on corn production. The land used to grow corn this year is usually used to grow other crops or left without sowing to rest the next year. We thus remove the year effect from the corn yield by constructing Y * k (s i ) = Y k (s i ) −Ȳ k , whereȲ k is the annual average corn yields at year k over all regions. We then treat Y * k (s i ) over different years as conditionally independent replicates given meteorological variables. To verify this assumption, we evaluated the autocorrelation function of the residuals from the fitted prediction model. The results support the conditional independence assumption and are deferred to the Supplement.
We fit the proposed Bayesian hierarchical model described in (4)-(6) to the centered response Y * k (s i ) and the centered meteorology trajectories W k (s i ; t). We choose an exponential correlation function in (6) to model the spatial correlation in α and β r , and the latent random fields ξ r and v r . To reduce the model complexity, we assume a common range parameter for each type of parameters, that is, a common φ β across β r , a common φ ξ for ξ r , and φ v for v r , for r = 1, . . . , p. Although this restriction can be relaxed, it largely improves the model parsimony and helps to stabilize the estimation. Additionally, it is reasonable to assume a common decaying rate because correlations in all model parameters trace back to the correlation structure of the original data. All range parameters will be estimated along with other parameters in our model fitting.
Assuming a more general Matérn correlation function does not seem to have clear advantages for our data analysis. This is because the smoothness parameter in the Matérn covariance function is difficult to estimate unless there is a reasonable number of observations in proximity (Stein 1999). Our countylevel crop data, however, does not align with such data structure. Moreover, it is well known that the three parameters in the Matérn covariance function cannot be all consistently estimated (Zhang 2004). As seen in simulation studies in Section 4, the exponential correlation assumption achieves satisfactory prediction accuracy even with the functional data generated from Matérn spatial correlation functions. We employ cubic B-splines with 40 equal spaced interior knots to construct ψ(t) = {ψ 1 (t), . . . , ψ L (t)} T for the estimation of f r , which captures the local variations in daily temperature trajectories. We tried different numbers of knots, and the experiments showed that the prediction performance is insensitive to the choice of L, given it is sufficiently large to capture the features in meteorology trajectories. Using the DIC measure described in Section 3.4, we choose p = 5 for the FPC expansion of the temperature trajectories. Based on a total of 15,000 MCMC samples, a posterior sample of size 2500 was obtained by using the first 5000 iterations as burn-in and thinning the remaining 10,000 by a factor of 4. Similar burnin and thinning practices were adopted by Peruzzi, Banerjee, and Finley (2020). Each MCMC iteration takes, approximately, 25 sec using a PC with a quad-core Intel(R) Core(TM) i-7 processor. Trace plots of the posterior samples and convergence diagnostics using the potential scale reduction factor (PSFR; Gelman and Rubin 1992) are provided in the supplementary materials to show the convergence of our MCMC algorithm.

Prediction Assessment
We examine the prediction performance of our model using a 10-fold cross-validation and compare it with competing statistical and machine learning methods. The statistical methods include FLM, PLFAM, and FGAM, described in Section 4.1, and a nonfunctional spatially varying coefficient Model (SVCM) which simply replaces the functional temperature covariates in our model by the annual averages of maximum and minimum temperatures. We also make comparisons with two machine learning methods that have gained popularity in crop science studies (van Klompenburg, Kassahun, and Catal 2020), the Neural Network (NN) and the extreme gradient boosting (XGB). To fit PLFAM and FLM, the number of FPC is chosen to recover at least 99.9% of the total variation in the functional predictors. Both NN and XGB are trained based on 731 variables including the annual precipitation and the daily minimum and maximum temperatures for the whole year, using R packages "nnet" and "XGB, " respectively. The tuning parameters are determined by cross-validation in each training set. Further exploration of machine learning methods based on various summary statistics of temperatures, such as weekly or monthly averaged temperatures, or FPCs of temperatures, are presented in the supplementary materials. There seems to be no advantage of using the reduced temperature information in the machine learning algorithms. This aligns with the conclusion of Chu and Yu (2020), which demonstrated that machine learning methods for crop yield prediction based on daily temperatures could avoid overfitting through the optimal choice of the number of hidden layers.
Two types of MSPE are calculated for model comparison: (i) Regular MSPE and (ii) Weighted MSPE using the size of the harvest land as the weight. The latter one is calculated by averaging Table 2 summarizes the prediction results. Our BSVFM achieves both the lowest MSPE and the lowest weighted MSPE compared to all six competitors. It is worth mentioning that the SVCM turns out to be a very competitive approach by allowing spatially varying regression coefficients while employing scalar rather than functional temperature covariates. This further demonstrates the essence of allowing spatially heterogeneous relationships in the corn yield prediction model. On the other hand, the better performance of BSVFM than SVCM shows treating daily temperature trajectories as functional covariates enables gleaning more information from temperatures. More details of the comparison between SVCM and BSVFM are illustrated in the supplementary materials. Both NN and XGB are highly sensitive to the choice of tuning parameters and the results in Table 2 represent their best performance we have obtained based on our attempts. In addition, results of machine learning methods are typically challenging to interpret. In contrast, our proposed BSVFM not only yields the best predictions, but also delivers insightful and interpretable messages, as detailed in Section 5.3, and provides prediction uncertainty through credible intervals. Figure 5 illustrates predicted yields together with their corresponding 95% credible intervals using the BSVFM, based on a randomly selected subset of the data. Credible interval widths vary over counties due to different harvest land size. The displayed credible intervals in Figure 5 are from counties with the harvest land sizes ranging from 47,000 to 297,600 acres. The narrow interval widths are obtained from counties with large corn harvest lands. This numerical experiment demonstrates how to use our model for crop yield prediction when the meteorology trajectories are observed. When the meteorology data are not available for future yield prediction, we will use weather forecasting techniques to construct the unobserved temperature trajectories and precipitation to make the prediction.

Further Nuances
Following the convention of PCA, we sort the initially estimated f r in the descending order of the posterior means of σ 2 ξ r (the variance of the corresponding FPC scores) so that the leading FPC functions f r represent the dominant modes of variation in the temperature trajectories. The spatially varying effect of precipitation and temperatures on corn yield is illustrated in Figure 6 which shows the posterior mean of the coefficients, α(s), for annual precipitation, and β r (s), for the top three FPC scores. Counties marked by yellow dots in (b), (c), and (d) indicate that their corresponding FPC scores were chosen by our Bayesian variable selection device. FPC scores are selected into the final model if their posterior inclusion probability is greater than 0.5. The estimated coefficients are evidently county-specific while spatially clustered. Figure 6(a) shows a cluster of large positive α(s) in central Missouri and negative α(s) concentrated in northern Illinois and Indiana. In Iowa, a gradual change of annual precipitation effect from the west to east is observed with the sign of coefficients changing from positive to negative. The spatially varying pattern also appears in β 1 (s), β 2 (s) and β 3 (s) in Figure 6(b)-(d). Indeed, the most striking pattern of the three estimated β processes is their strong spatial heterogeneity. Large clusters of negative and positive coefficients in β 1 (s) and β 2 (s) are observed, while β 3 (s) tends to have multiple smaller clusters of coefficients. The nonnull sets of coefficients are captured via the spatial variable selection, and we observe spatial clusters of counties with nonzero coefficients. Estimates of other location-specific coefficients also show spatially varying and correlated patterns, and are presented in the supplementary materials.  The estimated PC functions f r = ( f r1 , f r2 ) T , r = 1, . . . , 5, are presented in Figure 7. The two components of the FPC, f r1 (t) and f r 2 (t), generally show similar patterns over time, which is not surprising since the trajectories of maximum and minimum temperatures are strongly correlated. Overall, f 1 represents a temperature pattern featuring a warmer than average Winter, a cooler than average Summer and a hot Fall. Counties with a high loading on f 1 may also experience a smaller than usual diurnal range of temperature in January and August. Then f 2 appears to represent a colder than average Winter followed by monthly temperature oscillations that are agreed by both the maximum and minimum temperatures, such as higher than average temperature in July and September but lower in August and October. The third component f 3 depicts a cooler July and October but hotter August and September. The two warmer fall months also observe a larger than usual diurnal range of temperature through the large gap between f 31 (t) and f 32 (t). Then f 4 features a warmer-than-average temperature pattern in February, June and November; and f 5 appears to represent a cooler early summer followed by a hot September. The spatially varying coefficients β r (s) in Figure 6 reveal how the effects of their corresponding climate patterns vary over different geographical regions. Similar weather patterns, represented by similar PC scores, can have different, sometimes even opposite effects on corn yield in different regions.
To illustrate the benefit of allowing spatially varying effects of functional temperature covariates on corn yield, Figure 8 compares the fitted yield using the BSVFM and the FLM in Illinois and Iowa in 2018. Both the BSVFM and the FLM treat temperatures as functional covariates yet the latter is a spatially Figure 7. Estimated five PC functions, f r , r = 1, . . . , 5, from the multivariate Bayesian spatial FPCA. Each PC function is a vector f r = ( f r1 , f r2 ) T . The solid and dashed curves in each panel are the estimated f r1 and f r2 , representing the PC functions for maximum and minimum temperatures, respectively. Alphabet symbols "J", "A", "J", and "O" along with trajectories in each panel indicate the first day of January, April, July, and October, respectively, in the rescaled unit interval T = [0, 1].  invariant model. Apparently, the BSVFM method using both temperature and precipitation captures significantly more variability of corn yield than the FLM. Figure 8 also shows the variations in corn yield explained merely by the annual precipitation and the added value of incorporating the functional temperature predictors. For both states, the temperature trajectories seem to be the main driver for corn yield variability compared to annual precipitation. All these results indicate the necessity of allowing spatially varying coefficients and employing functional temperature covariates for corn yield prediction. Similar patterns are observed in other states.
Taking Iowa as an example, we examine the variable selection results using the posterior inclusion probabilities for the first and second FPC scores in Figure 9. Iowa is chosen for a better visualization due to it having the smallest number of missing counties among the five states. The probabilities are calculated as the posterior means of γ 1 (s) and γ 2 (s). Higher (lower) posterior probabilities imply a higher (lower) chance of the corresponding FPC scores being selected in the final model. Again, the inclusion probabilities display a heterogeneous feature, strongly supporting our conjecture of spatially varying influence of different PCs on crop yield. As a summary of the variable selection result for the five states, the average number of selected FPC scores in the final model is 3. The second FPC score shows the highest chance of being included with a 64.5% rate over all counties. The lowest selection percentage is observed on the fourth FPC score with a 54.1% rate. The selection rates of the first, third, and fifth FPC are 59.5%, 57.1%, and 62.3%, respectively. Overall, 72 counties select all five FPCs, 85 select four FPCs, and the rest counties mostly have 2-3 FPCs in the final model. We perform Moran's I test on spatial residuals obtained from the BSVFM fit over the five states at each of the 22 years. Fifteen out of 22 years show a p-value larger than 0.05. This demonstrates that the spatially varying coefficients of the BSVFM properly address the spatial variability in our data, and the assumption of independent errors in the functional regression model appears reasonable.

Conclusions
We developed a fully Bayesian functional regression model for corn yield prediction over a large-scale spatial domain. Our model treats county-level maximum and minimum daily temperature trajectories as spatially dependent functional predictors and annual precipitation as a scale predictor. Our model allows for idiosyncratic response-predictor relationships by incorporating spatially varying coefficients as well as spatially heterogeneous variable selection. We estimate the low-rank representation of the multivariate functional covariates via a penalized spline expansion approach while considering additive measurement errors and spatial dependency in the functional data. The introduction of spatially varying coefficients enables us to achieve more accurate predictions than other methods, demonstrated by both simulation and real data studies. The Bayesian approach also has the advantage of providing uncertainty quantification for predictions. Although our model is motivated by the corn yield prediction, it can be broadly applied to scalar-on-function regression models with spatially indexed functional predictors.
Crop yield is often boosted by the advancement of seed genetics and biotechnologies. South et al. (2019) reported that crops engineered with a photorespiratory shortcut for photosynthetic glitch are 40% more productive in real-world agronomic conditions. Those types of discontinuous year-to-year variation are beyond the scope of our current model, though our model can be naturally expanded to gain such power given appropriate predictors are available. To focus on modeling the variation arising mainly from meteorological variables, our 10fold cross-validation randomly chose 10% from all observations rather than data of an individual year as the validation set. That allows model fitting to learn the year-specific effect. In practice, an estimated shift should be added to the prediction if a breakthrough technology is expected to significantly improve the crop yield in the prediction horizon. Other factors such as soil conditions and soil moisture, if available, should also be included in the prediction model as their fluctuation can directly impact crop yield.
As mentioned in Section 2, our analysis focuses on counties with corn yield data available for at least five years between 1999 and 2020. Ignoring counties with no observations for at least 15 years seems reasonable in our study since very sparse corn data is likely to indicate insignificant corn production in that county, possibly due to its geographic features or attributes. For instance, certain counties may include large urban areas that do not or only intermittently grow corn. Additionally, we could not identify a systematic pattern on ignored counties regarding their geographic and time locations, and the missing mechanism is unknown. Thus, we assume missing at random in our analysis and only use the observed values with at least five repetitions for the model fitting. However, if we consider the extension of the model to other data or applications, the missing mechanism should be carefully investigated. For example, the public health data often suppress the small number of cases in a county for confidentiality issues (Shand et al. 2018). In such a case, the missing values are not at random and could place prediction at the risk of carrying bias if left unattended.
In addition to improving prediction accuracy, our BSVFM provides further insights in understanding the relationship between crop yield and temperature trends and variation as discussed in Section 5.3. Our model enables capturing spatially heterogeneous effects of temperature characteristics on crop yields, such as different effects of hot summer or warm fall on corn yield over different regions. With the accelerating climate change and more frequent extreme weather events, our study will shed new light on evaluating the impact of climate change on crop yield and food security.

Supplemental Materials
The online Supplement contains the full hierarchical model and MCMC sampling algorithm. It also contains MCMC diagnostics, additional Monte Carlo experiments, and additional corn yield data analysis results. R codes and data are also provided.