Template Independent Component Analysis with Spatial Priors for Accurate Subject-Level Brain Network Estimation and Inference

Abstract Independent component analysis is commonly applied to functional magnetic resonance imaging (fMRI) data to extract independent components (ICs) representing functional brain networks. While ICA produces reliable group-level estimates, single-subject ICA often produces noisy results. Template ICA is a hierarchical ICA model using empirical population priors to produce more reliable subject-level estimates. However, this and other hierarchical ICA models assume unrealistically that subject effects are spatially independent. Here, we propose spatial template ICA (stICA), which incorporates spatial priors into the template ICA framework for greater estimation efficiency. Additionally, the joint posterior distribution can be used to identify brain regions engaged in each network using an excursions set approach. By leveraging spatial dependencies and avoiding massive multiple comparisons, stICA has high power to detect true effects. We derive an efficient expectation-maximization algorithm to obtain maximum likelihood estimates of the model parameters and posterior moments of the latent fields. Based on analysis of simulated data and fMRI data from the Human Connectome Project, we find that stICA produces estimates that are more accurate and reliable than benchmark approaches, and identifies larger and more reliable areas of engagement. The algorithm is computationally tractable, achieving convergence within 12 hr for whole-cortex fMRI analysis. Supplementary materials for this article are available online.


Introduction
The functional organization of the human brain is a subject of intense ongoing scientific investigation. In recent decades, the use of functional magnetic resonance imaging (fMRI) data has rapidly advanced this effort, given its noninvasive nature, relatively high spatial and temporal resolution, and rapid technological evolution. Initial fMRI studies nearly exclusively focused on identifying regions of the brain associated with specific tasks or stimuli. Task fMRI data can be analyzed in a simple regression framework, with the use of a design matrix based on the task paradigm. However, in the past 15 years there has been a shift toward studying the functional organization of the brain more holistically, in the absence of a particular stimulus paradigm. These so-called resting-state fMRI studies are more challenging to analyze due to the absence of any information about the subject's mental state during the scan on which to base a design matrix. Therefore, the analysis of resting-state fMRI data often employs blind source separation algorithms. In particular, independent component analysis (ICA) has been established as a standard and reliable method of analysis for resting-state fMRI at the group level (Calhoun et al. 2001;Beckmann et al. 2005;Damoiseaux et al. 2006 ICA aims to separate fMRI data into a set of spatially independent components (ICs), which are images of the brain representing a spatial source signal, and a set of corresponding time courses that represent the activity of each IC over time. 1 ICs may represent functional brain networks, regions of the brain that tend to activate together, or may be of artifactual origin. One challenge is that ICA applied to single-subject fMRI data tends to result in highly noisy IC estimates due to high noise levels and limited scan duration. Recently, several studies collecting large amounts of fMRI data on a small number of "highly sampled" subjects have been able to accurately estimate individual features of functional topology (Laumann et al. 2015;Braga and Buckner 2017;Gordon et al. 2017). However, in most research and clinical settings, collecting long and/or numerous fMRI sessions is not practical or possible. Furthermore, ICs estimated using only data from individual subjects are not easy to match across subjects. Therefore, many studies apply ICA to group-level fMRI data combined across many subjects (Calhoun et al. 2001), and single-subject IC estimates are typically based on post-ICA reconstruction procedures (Beckmann et al. 2009;Erhardt et al. 2011). These procedures are computationally convenient but are not model-based and as such tend to have suboptimal estimation efficiency, particularly when applied to fMRI sessions of short or moderate duration, and do not provide an inferential framework.
In recent years, hierarchical ICA models have been proposed as a way to simultaneously estimate subject-and group-level ICs (Guo and Tang 2013;Shi and Guo 2016). These are part of a growing body of literature that uses likelihood-based procedures to perform ICA in a model-based framework (Chen et al. 2006;Eloyan, Crainiceanu, and Caffo 2013;Li et al. 2016;Risk, Matteson, and Ruppert 2019). There have also been efforts to use existing sets of group-level ICs or "templates" to estimate single-subject ICs based on either an optimization framework (Lin et al. 2010;Du and Fan 2013) or a statistical one (Mejia et al. 2020a). These hierarchical and template-based ICA models provide a principled approach to single-subject IC estimation, and have been shown to produce more accurate results. Formal statistical frameworks such as those proposed by Shi and Guo (2016) and Mejia et al. (2020a) also make inference on subjectlevel ICs possible and allow estimation and inference of "subject effects", deviations of individual subjects' brain networks from group average brain networks. One drawback of these existing models, however, is that they assume independence of the latent spatial effects across brain locations. While helpful from a computational perspective, this assumption does not hold up in practice, as brain networks and subject effects tend to be spatially smooth. Ignoring information shared across neighboring brain locations may result in loss of efficiency in estimated brain networks, as well as reduced power to detect engaged regions in ICs or subject effects.
Indeed, spatial dependencies are nearly universally ignored when modeling fMRI data in general. While spatial smoothing of the fMRI data or of effect estimates is common in practice and can improve estimation efficiency, its downsides include the need to specify the degree of smoothing and a lack of power compared with formal spatial models. Historically, due to the high-dimensional nature and spatial structure of fMRI data, spatial models would have been virtually impossible to estimate without overly drastic simplifications and restrictive assumptions. Yet in recent years, advances in various domains have made spatial modeling of fMRI feasible. These include increased computing power, advances in Bayesian computation, and advances in spatial statistics. Additionally and perhaps most importantly, "cortical surface fMRI" (cs-fMRI) has grown in popularity and availability since the emergence of the Human Connectome Project (Glasser et al. 2013;Van Essen et al. 2013). Historically, one of the major challenges to spatial models for fMRI has been the complex spatial dependence structure within the volume of the brain induced by cortical folding and the presence of multiple tissue classes. In cs-fMRI, on the other hand, the cortical gray matter, which contains much of the signal of interest, is projected to a two-dimensional surface manifold (Fischl, Sereno, and Dale 1999). In this format, the spatial dependence structure is greatly simplified and can be reasonably modeled as a decreasing function of the geodesic distance along the surface (Fischl, Sereno, and Dale 1999). Additionally, the number of cortical locations is greatly reduced relative to the number of volume elements (voxels) in the brain, and the data can be resampled to a lower resolution without substantial loss of information. This represents a major computational advantage for analysis of cs-fMRI compared with traditional volumetric fMRI.
Due to the high-dimensional nature of fMRI data, a suitable spatial prior must have a sparse precision structure for computational tractability. Gaussian Markov random field (GMRF) priors are a popular choice due to their convenient parametric form and simple precision structure (Rue and Held 2005). While many GMRF priors are built on a regular lattice and therefore would not be directly applicable to cs-fMRI data, the somewhat recently developed stochatic partial differential equation (SPDE) GMRF priors are built on a triangular mesh, the format of cs-fMRI data (Lindgren, Rue, and Lindström 2011;Bolin and Lindgren 2011). These priors have been used to model task fMRI data (Mejia et al. 2020b;Sidén et al. 2021), but have never been applied in resting state fMRI analysis.
Finally, a goal in many ICA-based analyses of resting-state fMRI is to obtain thresholded IC maps representing a set of brain networks. A common approach to thresholding is to retain locations exceeding a certain z-score or surviving a group-level hypothesis test . While this may be helpful for visualization purposes, it cannot be used to quantify the size and location of each subject-level brain network with any statistical certainty. Using a statistical framework such as the ones proposed by Shi and Guo (2016) or Mejia et al. (2020a), one can obtain the marginal posterior variance of each IC at each location and perform the equivalent of a hypothesis test at each location for each IC. However, since a test is performed at every location in the brain, multiple comparisons correction should be performed to control the false positive rate at a desired level. Since spatial dependencies are not accounted for in this framework, power to detect areas engaged in each IC can suffer. In a spatial Bayesian context, the joint posterior distribution is in theory available and could be used to determine areas of engagement while accounting for spatial dependencies. This poses some computational challenges, so spatial Bayesian models for task fMRI have traditionally resorted to using the marginal posterior distribution at each location in the brain, followed by multiple comparisons correction. Recently, Bolin and Lindgren (2015) proposed an approach to identification of excursion regions based on the joint posterior distribution, which was adopted by Mejia et al. (2020b) to identify activations in task fMRI analysis. By leveraging information shared across neighboring locations, this approach tends to have greater power to detect true effects, while maintaining strict false positive control.
Here, we propose spatial template ICA (stICA), a novel spatial modeling approach for estimating subject-level ICs based on a set of existing templates (population priors). Estimating the unknown parameters and latent variables in stICA presents major computational challenges, since the model does not fully factorize over locations or independent components. We develop a computational approach based on expectationmaximization (EM) to obtain maximum likelihood estimates (MLEs) of model parameters and posterior mean and variance estimates of the latent fields. We also incorporate an excursions set approach to identify areas of engagement in each IC and areas of deviation (from the group average) in each subject effect map. We address several computational challenges arising from the large scale of the problem: a typical number of brain networks of interest, L, may range from 10 to 100, and the number of cortical locations V, based on a reasonable amount of resampling, is 10,000 or more. This results in computations involving very large (LV × LV) matrices. As such computations are only tractable with sparse matrices, we must take care when performing the necessary steps to obtain the MLEs and posterior quantities of interest. One additional challenge is the need for numerical optimization to obtain the MLE of some model parameters, which could be very slow without a careful approach to computation. We describe our computational approach in detail below, and find through both simulation studies and real fMRI data analysis that time and memory requirements tend to be reasonable.
The remainder of this article is organized as follows. Section 2 describes the stICA model, the EM-based estimation procedure, and the excursions set approach used to identify areas of engagement. Section 3 presents a simulation study designed to assess the accuracy of the proposed methods and their power to detect engaged regions. We compare the performance of stICA with that of two baseline methods: template ICA with a spatial independence assumption (tICA) and dual regression, a popular but ad-hoc method to obtain subject-level ICs based on a set of existing group ICs. Section 4 presents a study of restingstate fMRI data from the Human Connectome Project (HCP) . We apply the proposed and benchmark methods to two sessions of data from one randomly selected subject and assess the agreement of the estimated subject-level features across sessions using each method. We also present the results visually, focusing on one IC representing an attention brain network, to provide a qualitative illustration of the benefits of stICA over existing methods. Finally, we conclude with a brief discussion in Section 5.

Methods
Consider a set of L functional brain networks of interest. These are identified by the researcher a-priori by applying ICA with a specified model order ≥ L to data from a group of subjects, then identifying the ICs that represent brain networks of interest. This process requires scientific expertise and often involves adjusting the model order to achieve a desired granularity of the brain networks. Since we are able to estimate these brain networks with a high degree of accuracy using data from a large group of subjects, we assume the population average ICs to be known. The between-subject variance within each IC can also be estimated accurately using data from a large group of subjects as in Mejia et al. (2020a); we therefore also assume the between-subject variance of each IC at each brain location to be known. In some cases a relatively small sample may be available for template estimation, such as when a population-specific template is estimated using a holdout set of subjects. Practical measures to improve estimation accuracy include using modern cortical surface registration techniques to better align subjects and applying a small degree of spatial smoothing to the mean and variance estimates. We refer to the set of known population mean and between-subject variance maps as a template.
Here, we describe spatial template ICA (stICA), which models subject-specific ICs as random deviations from the known population averages, and assumes spatial priors on these deviations to leverage their inherent spatial dependence. Below, we present the stICA model, detail the proposed EM-based estimation approach, and finally describe the excursions set approach used to identify regions of "engagement" in each IC or "deviation" in the corresponding subject effect maps. Note that subject-level fMRI data may also contain additional spatial source signals of neural or artifactual origin, which combine with the template brain networks to form the observed data; these "nuisance" signals are accounted for, as described in Section 2.4. LetẎ = [ẏ(1), . . . ,ẏ(V)] (T × V) represent the preprocessed fMRI data for an individual subject (see Section 2.4 for details on preprocessing), where T ≥ L is the number of time points or volumes collected and V is the number of brain locations (e.g., vertices). We first perform dimension reduction based on SVD ofẎ. Briefly, let U be the T × L matrix containing the first L left singular vectors, let d be the th singular value, and let D = diag{d 1 , . . . , d L }. As described by Beckmann, Jenkinson, and Smith (2003), dimension reduction in probabilistic ICA is achieved by pre-multiplying by the d is the mean of the remaining T − L singular values.

The Spatial Template ICA Model
For an individual subject, let s (v) denote the intensity of the th IC at brain location v, v = 1, . . . V, which we wish to estimate. Let s 0 (v) and σ 2 (v) denote the known template mean and variance, respectively, for IC at location v. Letting s(v) = [s 1 (v), . . . , s L (v)] , the first level of the model at location v is given byẏ whereṀ is the T × L mixing matrix that describes how each spatial source signal activates over time to result in the overall composite neural activity observed during the fMRI scan. Following dimension reduction as described above, let y(v) = Hẏ(v), M = HṀ, e(v) = Hė(v), and C = HH , which gives The standard template ICA (tICA) model proposed by Mejia et al. (2020a) assumes that each subject-level IC = 1, . . . , L is related to the corresponding population ICs as s , and s 0 (v) and σ 2 (v) are known via the template. The deviations δ (v) are assumed to be independent over all locations v = 1, . . . , V. While computationally convenient, this assumption ignores the inherent spatial dependence of these deviations, resulting in reduced estimation efficiency and power to detect significant areas of deviation (δ (v) = 0) or engagement (s (v) = 0).
The spatial template ICA (stICA) model instead assumes that each latent field of deviations δ = (δ (1), . . . , δ (V)) exhibits spatial dependence. The prior distribution on δ is a combination of the template, which encodes the variance, and an SPDE spatial process, which encodes the spatial dependence. Specifically, the prior covariance of δ has the form D R D , where D = diag{σ (v) : v = 1, . . . , V} and R is a correlation matrix arising from an SPDE spatial process with unit variance. Note that D is known through the template, while R must be estimated. Since R is a large dense matrix, in the algorithm described below we avoid its explicit computation in favor of R −1 , which is sparse. Letting s = (s (1), . . . , s (V)) and s 0 = (s 0 (1), . . . , s 0 (V)), the second level of the stICA model for IC = 1, . . . , L is given by More details on the spatial dependence structure encoded in R are provided in the next section.

Spatial Process Priors
We assume that each δ , = 1, . . . , L, is generated from a stochastic partial differential equation (SPDE) spatial process with unit variance (Lindgren, Rue, and Lindström 2011;Bolin and Lindgren 2011). SPDE priors are a class of GMRF priors, which are popular for high-dimensional spatial problems due to their sparse precision structure. They are constructed as discretizations of continuously indexed Matérn Gaussian fields, and as such have parameters with clear interpretations that are invariant to the spatial resolution of the discretization. Specifically, the Matérn random field X(s) is defined as the solution to (κ 2 − ) α/2 (τ X) = W, where the equation is posed on the spatial domain R d , W is Gaussian white noise, and is the Laplacian. The positive parameters α, κ, and τ control the smoothness, correlation range, and variance of X, respectively. To produce a GMRF, the problem is restricted to a bounded domain D, and a finite element discretization is constructed using a basis defined through a triangular mesh on D. In this work, we fix α = 2, as spectral theory shows that α must take an integer value to yield a Markov field. The variance of X for d = 2 and α = 2 is σ 2 = (4πκ 2 τ 2 ) −1 . 2 Both the model and the discretization method can be formulated on quite general domains. Therefore, SPDE priors are applicable to both cortical surface fMRI data, which has an inherent triangular mesh structure, as well as cerebellar and subcortical volumetric regions, where a mesh can be constructed based on the voxels. For example, the delaunayn function of the geometry package in R can be used to create a tesselation of the convex hull of a set of points in R d subject to certain constraints (Habel et al. 2019). For voxels in R 3 , the resulting tesselation is a tetrahedral mesh, which is the threedimensional analogue to a two-dimensional triangular mesh. Because of the possibility of extending the SPDE to generate more general models, and because of the flexibility in how the mesh is constructed, the SPDE priors are more flexible than many other classes of GMRFs built on a regular lattice. SPDE priors have been previously used successfully for modeling spatial dependence of activation fields in task fMRI studies in both volumetric and cortical surface fMRI data (Mejia et al. 2020b;Sidén et al. 2021).
Let B be the number of distinct brain regions we wish to analyze (e.g., left cortical surface, right cortical surface, left cerebellum, right cerebellum). Let V b be the number of brain locations in region b. Given their spatial segregation, we consider each brain region to be spatially independent of other regions. Assume that the V = b V b brain locations are grouped by regions. For each region b = 1, . . . , B, we have a triangular mesh consisting of N b ≥ V b vertices representing the original V b data locations, plus possible additional locations added to satisfy certain constraints and to control boundary effects. Let , and each x ,b follows a zero-mean, unitvariance SPDE prior defined on the mesh for region b. The precision matrix for an SPDE process is sparse and takes the form τ 2 (κ 4 F + 2κ 2 G + GF −1 G), where F is a known diagonal matrix and G is a known sparse matrix, which contains nonzero locations in cells corresponding to neighboring vertices in the triangular mesh. Here, x ,b is assumed to have unit variance, The second level of the stICA model for each IC = 1, . . . , L can be re-expressed as where s 0 , D , and A are known. The spatial precision matrix of δ is therefore given by Note that R −1 is sparse and easily computable using a partitioned matrix inverse approach (see Appendix A, supplementary materials).

Full Vector Form of Model
Given that the model factorizes over locations v at the first level and over ICs at the second level, to obtain posterior estimates of the ICs in s we first need to re-express both levels in full vector form. Let s = s 1 , . . . , s L , y = y(1) , . . . , y(V) and e = e(1) , . . . , e(V) ) . Note that s is grouped by ICs while y and e are grouped by locations. We can regroup the elements of s by location using a permutation matrix P so that where ⊗ denotes the Kronecker product, the first level of the stICA model is given by and the second level is given by where s 0 = s 01 , . . . , s 0L , D = diag{D 1 , . . . , D L }, and R = diag{R 1 , . . . , R L }.

EM Algorithm
In this section, we derive an expectation maximization (EM) algorithm to iteratively obtain maximum a posteriori (MAP) estimates of the latent variables s and maximum likelihood estimates (MLE) of the model parameters = {M, {κ : = 1, . . . , L}}. Recall that M is the mixing matrix encoding the temporal activation profiles of each IC and κ controls the spatial correlation range of each IC s . We will use the estimate of the observation variance ν 2 0 obtained through dimension reduction as described above and hence treat it as known.
Below, we derive the expected log-likelihood and the posterior moments of the latent variables (E-step), which has an explicit closed form. We then derive the maximum likelihood estimator for M and present a numerical optimization approach for maximum likelihood estimation of the κ s (M-step). Although many of the matrices involved are very large (LV ×LV, where typically L is between 10 and 100 and V is between 10,000 and 100,000), all necessary quantities can be computed efficiently using linear algebra operations and taking advantage of sparsity.

Expected Log-likelihood (E-Step)
The likelihood can be written where g(· : μ, ) is the multivariate Gaussian density with mean vector μ and covariance . Given the parameter estimates at iteration k of the EM algorithm,ˆ (k) , the expected loglikelihood is Since Q 1 ˆ (k) only involves M and Q 2 ˆ (k) only involves the κ (via R ), we can write where Tr(·) denotes the trace operator. See Appendix B, supplementary materials for details.

Posterior Moments of Latent Variables (E-Step)
To maximize the expected log-likelihood and determine the MLEs of the unknown parameters, we must first determine the form of the posterior moments appearing in Q 1 ˆ (k) and We therefore obtain the posterior distribution of s, conditional on the current estimates of the parameters,ˆ (k) , which is Gaussian (see Appendix C, supplementary materials for details).
Fixing the parameters in , we can compute m and , since D, P, and C ⊗ are known sparse matrices, and R −1 and M ⊗ are fixed sparse matrices given . While is too large to directly invert, we can compute −1 m by solving for x in the system of linear equations x = m to obtain μ s|y . We can obtain the posterior precision −1 s|y but avoid the infeasible computation of s|y , which appears in the expected log-likelihood but is not strictly required to obtain the MLEs of the parameters.
The first and second posterior moments of s appearing in Q 1 ˆ (k) are given by The posterior moments of s appearing in Q 2 ˆ (k) can be obtained by defining E as a V × VL matrix that equals zero everywhere except the th column block of size V, which equals I V , so that s = E s and In the next section, we derive MLEs of each of the parameters and describe the computational strategy for terms involving large matrix inverses.

MLEs of Parameters (M-Step)
Mixing matrix. First, note that Q 1 ˆ (k) factorizes over locations v, since M ⊗ = I V ⊗ M and C ⊗ = I V ⊗ C. Letting t = PE s y,ˆ (k) and T = PE ss y,ˆ (k) P , we can write: where t(v) denotes the vth block size L of t, and T(v, v) denotes the vth diagonal L×L block of T. Note that t(v) and T(v, v) contain those elements corresponding to location v. Then, setting to zero, we havê Note that the second term inM involves s|y , but only a small number of the terms in s|y are actually required, namely the V small L × L block diagonals, which can be computed efficiently. See Appendix D, supplementary materials for details. SPDE Parameters. Turning our attention to the estimation of the κ , we can write where Note that, for a given value of κ , Q A 2 ˆ (k) and Q C 2 ˆ (k) are easily computable, since R −1 is a sparse matrix (see Appendix A, supplementary materials for details on computation of R −1 ), and the posterior mean of s is available from the E-step. Regarding Q B 2 ˆ (k) , we avoid computation of E s s y,ˆ (k) by noting that it appears inside of a trace as a product with a sparse matrix. Since only the diagonals of the product are required for the trace (which for symmetric matrices are equal to the column sums of the element-wise product), we only need to compute the entries corresponding to the nonzero elements of R −1 . Lettingˆ andm represent, respectively, the values of and m (defined in Section 2.2.2) given the current parameter estimatesˆ (k) , and definingŴ = −1mm ˆ −1 , we can write where B , represents the th V × V diagonal block of a VL × VL matrix B. The computational strategy for both trace terms in Q B 2 ˆ (k) is detailed in Appendix E, supplementary materials. Therefore, for each κ , = 1, . . . , L, the function to be maximized is (18) We use the optimize function from the stats package in R to numerically find the value of κ that maximizes f (κ |ˆ (k) ). We search within the range of 0-5, since we have empirically observed that κ typically falls well within these bounds in neuroimaging applications.
To accelerate the convergence of the EM algorithm, we adopt the squared iterative method (SQUAREM) proposed by Varadhan and Roland (2008), which is implemented in the R package SQUAREM version 2020.2 with the default control parameters (Du and Varadhan 2020). The algorithm is repeated until the Euclidean norm of the change in all model parameters falls below a given tolerance.

The Common Dependence Model and EM Algorithm
It may be reasonable to assume that the L spatial source signals have similar degrees of spatial dependence, controlled by the κ . Each source signal represents a functional brain area, which tend to be similar in their spatial distribution at a given ICA dimensionality L. In a standard ICA setting, this may not be a reasonable assumption due to the presence of noise-related source signals capturing variability in the data related to subject head motion, respiration and other types of artifacts. The spatial properties of such components is quite distinct from those of functional brain areas. Here, however, such nuisance signals are not included in the model but are instead estimated and removed from the data a-priori as described in Section 2.4. Further, we observe empirically that the MLEs for κ obtained through the EM algorithm described above tend to be similar across ICs. Therefore, we may consider a simplified version of the model assuming a common κ parameter across the L ICs, which can be estimated more efficiently.
The spatial ICA model with common dependence is given by equations (4-5), except that now all R ≡ R * . The latent variables and unknown model parameters in this model, now just = {M, κ}, can again be estimated through EM. The expected log-likelihood given in (8) and the posterior moments of the latent variables given in (11) are unchanged, except that each instance of R is replaced by R * . The MLEs of M is unchanged, and the MLE of κ is now the value that maximizes

Identifying Regions of Engagement and Deviation
After the model parameters have been estimated, the next step is to identify brain locations in each IC that are engaged (s (v) = 0) or deviate from the population mean (δ (v) = 0). The simplest way of testing whether a given location is engaged or deviates would be to use the marginal posterior probability distribution of s (v) or δ (v) to essentially perform a hypothesis test. However, this would introduce a massive multiple comparisons problem, correcting for which may result in a loss of power. In the spatial Bayesian framework, instead, we can use the posterior distribution of s or δ to account for spatial dependence and avoids multiple comparisons. We adopt the excursions set method proposed by Bolin and Lindgren (2015), which efficiently estimates the set of locations in a field f exceeding a threshold γ with joint posterior probability at least 1 − α.
In the case when f is an indirectly observed random field, such as s or δ, we define the excursion set as a region where with some (high) probability it exceeds the threshold. Specifically, the excursion set with probability α, E + γ ,α (s), is defined as That is, it is the largest set for which the level γ is exceeded at all locations in the set with probability 1 − α . Bolin and Lindgren (2015) proposed a computationally efficient approach for finding these sets, which is implemented in the excursions R package (Bolin and Lindgren 2018). In our context, we have multiple random fields s , and can thus compute an excursion set for each of these fields in one of two ways. One approach is to compute a joint excursion set for all fields, so that the joint probability over all fields is maintained at 1−α. This set can be computed based on the joint posterior distribution for the s fields. An alternative approach is to compute the excursion set for each s separately, based on the corresponding marginal posterior. We adopt the latter approach, which means that for each latent field, the probability that not all identified locations are engaged is controlled at α. These computations can be performed efficiently in excursions without first marginalizing out each s , which would be computationally expensive.

Preprocessing
Several preprocessing steps must be performed on the data prior to estimation of the stICA model: centering, scaling, and removal of nuisance ICs. We first center the fMRI data across both time and space to remove the mean image and global signal. This is required for dual regression (Beckmann et al. 2009), a popular but ad hoc method for subject-level IC estimation, which we use to obtain parameter starting values and for template estimation. As fMRI data is unit-less, we also introduce standard units by scaling by the global image standard deviation, defined as the square root of variance across the image, averaged over time.
We next remove nuisance ICs, defined as ICs that are present in the subject-level data but are not found in the template. We adapt an approach proposed in Mejia et al. (2020a): first obtain an estimate of the template ICs from the data and remove them from the data, and second estimate the nuisance ICs from the residual data and remove them from the original data. This results in "nuisance-free" data, to which the spatial template ICA algorithm can be applied. Specifically, we use dual regression to obtain an initial estimate of the template ICs along with their associated mixing matrix, which are multiplied and subtracted from the data. We estimate the number of remaining or nuisance ICs, L , using the penalized semi-integrated likelihood (PESEL) method with homogeneous singular values proposed by Sobczyk, Bogdan, and Josse (2017) implemented the pesel R package (Sobczyk, Josse, and Bogdan 2020). In principle, we will estimate the nuisance ICs and their associated mixing matrix, which can be multiplied and subtracted from the original centered and scaled data. This can be done using any standard ICA implementation, such as the infomax ICA algorithm of Bell and Sejnowski (1995) as implemented in the ica R package (Helwig 2018). However, most ICA implementations obtain ICs by first estimating an equal number of PCs, then estimating the ICs as a rotation of the PCs. The product of the ICs and their associated mixing matrix is mathematically equivalent to the product of the first L principle components and their associated principle scores. Therefore, we simply estimate these principle components and scores using singular value decomposition (SVD), which are then multiplied and subtracted from the original centered and scaled data. If adequate computation time is available, this process can be performed again after applying spatial template ICA to obtain a more accurate estimate of the template ICs before nuisance IC estimation and removal.

Simulation Study
We perform a simulation study to quantify the accuracy of the proposed spatial template ICA (stICA) approach and to compare its performance with two alternatives: standard template ICA (tICA) without spatial priors (Mejia et al. 2020a) and dual regression, a fast but ad-hoc method that is commonly used in practice (Beckmann et al. 2009).
First, we give a brief overview of the simulation design before going into detail below. We consider L = 3 two-dimensional brain network maps. For each brain network, we use a generating mean and variance map to generate random draws. Since those draws are independent across space, the random subject effects (difference from population mean) are then spatially smoothed to produce the L = 3 subject-level ICs for each subject. Based on the ICs from a set of training subjects, we produce templates using Monte Carlo simulation. We then simulate fMRI timeseries for 50 test subjects and apply each method (i.e., dual regression, tICA, and stICA) to obtain estimates of the subject-level ICs and the mixing matrix representing the temporal activation profile of each IC. For stICA and tICA, we also identify areas of engagement. Finally, as the "functional connectivity" between the spatial source signals is also of interest, we compute the correlation of the temporal activation profiles of each pair of ICs to produce an L × L functional connectivity matrix. We evaluate the performance of stICA relative to tICA and dual regression based on the accuracy of the estimated ICs, of the functional connectivity matrices, and of the areas of engagement.
Subject-level ICs. Each IC is a two-dimensional image consisting of 46 × 55 isotropic 1 mm pixels for a total of V = 2530 data locations. The generating mean and variance maps are shown in Figure 1. Each mean map is generated by placing a point mass of a given amplitude at a certain location and convolving it with a two-dimensional Gaussian kernel. The point mass for each IC is located at (12, 15), (35, 40), and (15, 40), respectively, and the full width at half maximum (FWHM) of the Gaussian kernel is 30, 40, and 45. 3 For each IC, the generating variance is proportional to the corresponding mean, reflecting the observation that the maps representing subjects' functional brain networks tend to vary the most in the region where that network is "engaged, " and tend to vary little over subjects in "background" areas.
Subject effects represent deviations of the subject ICs from the corresponding population mean ICs. The spatial template ICA model assumes that these deviations are smooth, as they tend to be in practice. For each subject, we first generate spatially independent (e.g., unsmoothed) subject effects from a meanzero Normal distribution with variance given by the generating variance at pixel v in IC , displayed in the second row of Figure 1. We then smooth each subject effect map by convolving it with a Gaussian kernel with FWHM equal to 5 mm, equivalent to a standard deviation of 2.12 mm. We apply a global rescaling factor to the smoothed effect maps to match the generating variance at the peak pixel. Subject ICs are finally obtained by adding the smoothed and rescaled subject effects to the population mean maps (Figure 1). Subject effect maps and ICs for one example subject are shown in Figure 2. Based on 1000 Monte Carlo replicates, the templates are defined by the Monte Carlo mean and variance at each location v for each IC , denoted {s 0 (v) : v = 1, . . . , V} and {σ 2 (v) : v = 1, . . . , V}, respectively. The resulting templates are shown in Figure 3.
fMRI Timeseries. Using the subject-level ICs resulting from the process described above, we generate fMRI timeseries for n = 50 test subjects based on the model in Equation (1) as follows. We set T = 800, representing approximately 10 min assuming a time resolution of 0.72 sec, as in the Human Connectome Project Van . This duration and temporal resolution are fairly typical of modern resting-state fMRI acquisitions. We first generate the temporal activation profiles of each IC, which form the T × L mixing matrix, and a set of white noise residuals generated as independent N(0, ν 2 0 ) samples. We set ν 0 to 11.2 to yield a signal-to-noise (SNR) of 0.5, similar to real fMRI data. For the temporal activation profiles, we draw without replacement from a set of 16 meanzero timecourses that are based on real fMRI data from the Human Connectome Project (see Figure 4). Each timecourse is normalized to have mean zero and standard deviation 1.
Note that while the mixing matrix itself may not be of scientific interest, what is often of interest is the pairwise similarity or coherence (i.e., Pearson correlation) of the timecourses, known as the functional connectivity (FC) between the brain networks represented by each IC (Joel et al. 2011). We therefore assess each method's ability to accurately estimate the FC matrix for each subject, in addition to the IC maps.
Model Estimation. Based on the original 2530 data locations, we construct a mesh with the inla.mesh.2d from the INLA R package (Lindgren and Rue 2015). The triangular mesh, consisting of 4214, vertices is shown in Figure 5(a). It shows a middle square of smaller triangles connecting vertices corresponding to the observed data, surrounded by two boundary layers of larger triangles, which are added to improve performance along the data boundary. To estimate the ICs and mixing matrix of each simulated test subject using stICA, we apply the EM algorithm with common dependence assumption, as described in Section 2.2. We run the algorithm until convergence with a tolerance of 0.001, up to 100 parameter updates. The number of updates and computation time for each subject is displayed in Figure 5(b). The median computation time for model estimation is 32 min over 18 parameter updates. After model estimation, we identify areas of engagement for each subject and IC via the excursions set approach described in Section 2.3. We use an engagement threshold of γ = 1 (this avoids very small values induced by the smoothing used to generate subject ICs) and a significance level of α = 0.1. For comparison purposes, we use the variance estimates produced by tICA to identify areas of engagement by performing a one-sided t-test at each location. We obtain engagement maps based on Bonferroni correction for familywise error rate (FWER) control at α = 0.1. Note that the probability of not committing a familywise error (defined as having at least one false positive pixel) is analogous to the probability that all locations labeled as significant are truly engaged, which is the basis of the excursions set approach used in stICA. The difference is that stICA employs the joint posterior distribution of the latent fields, while tICA employs the marginal posterior distribution at every location. As such, using stICA we would expect to discover larger areas of engagement versus tICA, since the positive dependence between neighboring locations is accounted for.
Performance Measures. We compute the following measures to compare the performance of stICA with tICA and dual regression: mean squared error (MSE) maps of the estimated ICs; correlation between the estimated and true ICs for each subject; the accuracy of engagement maps in terms of false positive rate (percentage of truly nonengaged pixels labeled as engaged) and power (percentage of truly engaged pixels labeled as engaged); and the MSE of functional connectivity.

Simulation Results
In this section, we present the performance of stICA alongside two competing methods, tICA and dual regression, in accurately recovering the true subject-level ICs and FC matrices and identifying areas of engagement in each IC. For FC matrix estimation, we also include an oracle regression estimator, which is based on regressing the true subject-level IC maps against the fMRI timeseries data. This represents the best possible performance of dual regression for FC matrix estimation, since dual regression is based on such an approach, but without access to the true, unknown ICs.
We first consider the ability of each method to estimate the true subject-level IC maps. Figure 6 displays the true values and estimates of IC 1 for the first three simulation subjects. The stICA estimates appear highly similar to the true ICs. The tICA estimates are similarly accurate in background areas, but are slightly more noisy in areas of engagement. This is because while stICA incorporates spatial priors and empirical population priors, tICA only uses population priors, which tend to have higher variance in areas of engagement. Dual regression is less accurate across the image compared with tICA or stICA. Figure 7 displays two measures of IC accuracy: mean squared error (MSE) across subjects (a) and Pearson correlation across pixels (b). To compute MSE, the estimated maps from stICA and tICA were first rescaled to match the scale of the true ICs, since when the ICs and mixing matrix are simultaneously estimated, the ICs are only identifiable up to a scaling factor. Panel (a) shows that stICA yields highly accurate IC estimates across the entire image. tICA outperforms dual regression across the image but is less accurate than stICA in the areas of engagement for each IC. This illustrates the benefit of including both population and spatial priors in the stICA model. Dual regression, which is commonly used in practice, results in highly noisy IC estimates. Panel (b) displays two Pearson correlation-based  . Sixteen realistic timecourses, from which three were sampled to create the mixing matrix for each subject in the simulation study. Each timecourse was based on the fMRI signal associated with an ICA-based brain network in one subject from the Human Connectome Project. As fMRI data is collected in arbitrary units, no y-axis values are displayed. For mixing matrix construction, each timecourse was centered and scaled to have mean zero and standard deviation one. measures for each subject and IC: correlation across all pixels (left) and correlation-at-the-top (CAT) (right), which is computed only based on locations with true IC value over 1, the threshold assumed for identification of areas of engagement.
CAT therefore conveys the ability of each method to accurately estimate the areas of most scientific interest. In terms of both correlation and CAT, stICA strongly outperforms tICA, and both stICA and tICA outperform dual regression. This again illustrates the benefit of incorporating both spatial priors into the stICA model, in addition to the empirical population priors used in tICA. Figure 8 displayes true and estimated areas of engagement for IC 1 for the first three simulation subjects. The stICA areas of engagement are based on the joint posterior distribution of each IC, as described in Section 2.3. The tICA areas of engagement are based on performing a t-test at every location and correcting for multiple comparisons through Bonferroni correction to control the FWER. The stICA areas are somewhat larger and more similar to the true areas. The FWER-corrected tICA areas are more conservative. Figure 9 displays the false positive rate (FPR) of both methods, averaged over subjects, along with the power to detect true engagements. Both stICA and tICA enforce stringent false positive control, but stICA has substantially higher power to detect true effects, achieving over 80% mean power for each IC.
Finally, assess the accuracy of stICA, along with competing methods, for estimating functional connectivity (FC). Figure 10 displays the MSE of FC between each pair of ICs. The most accurate FC estimates are achieved with stICA. Notably, stICA and tICA both strongly outperform oracle regression, which is based on regression of the true ICs against the fMRI timeseries data. tICA and stICA both result in substantially more accurate estimation of FC compared with dual regression.

Data Analysis
We apply the proposed and benchmark methods to restingstate fMRI data of one randomly selected subject (645450) from the Human Connectome Project (HCP) 1200-subject release ). The HCP is publicly available for download (http://humanconnectome.org) and includes fully processed fMRI data on the cortical surface (*.dtseries.nii) and corresponding surface model files (*.surf.gii). We use the midthickness surface model, which represents the midpoint in the cortical gray matter between the white matter (interior) surface and pial (exterior) surface. We perform all computations in R (R Core Team 2020) using the templateICAr package, which for spatial modeling depends on the R-INLA package (Lindgren and Rue 2015) and the pardiso parallel computation library (Schenk and Gärtner 2004;van Niekerk et al. 2021). We use the Connectome Workbench (Marcus et al. 2011) (available for download at https://www.humanconnectome. org/software/connectome-workbench) via the ciftiTools R package (Pham, Muschelli, and Mejia 2022) to resample the surface models and fMRI data from their original 32,000 vertex resolution to approximately 6000 vertices per hemisphere. This greatly improves computational speed and feasibility with minimal loss of spatial resolution, as the surface area represented by each vertex remains much smaller than the spatial areas represented by different brain networks. The triangular mesh representing this surface is shown in Figure 11.
The fMRI data was acquired using the left-to-right (LR) phase encoding, with a repetition time (TR) of 0.72 sec between volumes and a total of T = 1200 volumes collected over approximately 15 min . In addition to the minimal processing described in Glasser et al. (2013), the fMRI data we analyze has been high-pass filtered and denoised using the ICA-FIX procedure Griffanti et al. 2014). This version of the data is available as part of the HCP data release. Prior to model fitting, we center the fMRI data across time and space, then scale the data by the square root of the average image variance. We apply the proposed and benchmark methods to the full data, as well as to a shorter version of the data created by truncating each session after the first 400 volumes (5 min), in order to determine the influence of scan duration on the results. To assess the ability of stICA and the benchmark methods to produce reliable results, we repeat the analyses on a second session of data from the same subject, acquired and processed using the same techniques. Figure 6. True and estimated maps of IC 1 for three randomly selected simulation subjects. The stICA estimates appear very similar to the true ICs. The tICA estimates are similarly accurate in background areas but are slightly more noisy in engaged areas. In contrast to stICA which incorporates spatial priors and empirical population priors, tICA only uses population priors, and those priors tend to have higher variance in engaged areas. Dual regression is more noisy across the image compared with tICA or stICA.
For template estimation, we use resting-state fMRI data (processed in the same way as the focal dataset described above) from 500 randomly sampled subjects in the HCP 1200subject release, following the procedure described in (Mejia et al. 2020a). Briefly, we select 16 ICs from the 25-component group ICA maps included in the original HCP 500-subject release, which correspond to established brain networks. The exclusion of nine group ICs also tests the ability of stICA and tICA to account for the presence of nuisance ICs, which may include these excluded group ICs along with subject-specific sources of neuronal and artifactual origin not having been effectively removed by ICA-FIX. The estimated templates consist of a map Figure 7. Accuracy of estimated ICs in the simulation study. (a) MSE of estimated ICs using stICA, tICA and dual regression. stICA yields highly accurate IC estimates across the entire image. tICA outperforms dual regression across the image but is less accurate than stICA in the areas of engagement for each IC. This illustrates the benefit of including both population and spatial priors in the stICA model. Dual regression, which is commonly used in practice, results in highly noisy IC estimates. (b) Pearson correlation and correlation-at-the-top (CAT) of the estimated and true ICs. Each line represents an individual simulation subject, and boxplots are displayed to summarize the distribution across subjects. Correlations are Fisher z-transformed (i.e., z(r) = 1 2 ln((1 + r)/(1 − r))) to enable better comparison of high values. These results show that stICA strongly outperforms tICA, and both stICA and tICA outperform dual regression. This again illustrates the benefit of incorporating both population and spatial priors into the stICA model.  of the mean and between-subject variance for each of the 16 ICs. Figure 12 shows the mean and variance maps for one IC representing part of the attention network; Appendix Figures G.2 and G.3, supplementary materials show corresponding information for two additional ICs representing part of the default mode network (DMN) and motor network, respectively.
We apply the proposed EM algorithm to estimate the subject ICs and subject effects for each of the 16 template ICs. We treat both hemispheres of the brain as separate regions (i.e., B = 2), since the meshes representing their neighborhood structure do not overlap. We exclude medial wall vertices containing missing data to improve computational efficiency, since we observe minimal differences in estimates and areas of engagement when they are included. See Appendix F and Figure F.1, supplementary materials for a comparison. We run the algorithm until convergence to a tolerance of 0.01. For comparison, we also obtain estimates using standard tICA without spatial priors and dual regression. We then apply the excursions set approach described in Section 2.3 to identify areas of engagement in each IC, as well as areas of positive and negative deviation in each subject effect map. These deviations represent differences between the individual subject's IC maps and the group average. Positive  deviations indicate that a particular area is more engaged in a subject's brain network relative to the group, while negative deviations indicate that a particular area is less engaged relative to the group. For comparison, we also identify areas of engagement and deviation using tICA by performing a t-test at every vertex separately, then correcting for multiple comparisons using Bonferroni correction to control the family-wise error rate (FWER) at α = 0.01. This is analogous to achieving a joint posterior probability of 1 − α as in the stICA framework.
We use an engagement threshold of γ = 0 in each case. All computations were performed on a Mac Pro computer with 28 processors. Computation time for stICA was approximately 11 hr. Computation time for identifying areas of engagement or deviation with stICA was approximately 2 hr for a single IC or subject effect map.
We quantify the reliability of subject-level features, namely the subject effect maps and areas of positive and negative deviation, rather than on the ICs themselves, since reliability of IC  , based on the excursions set approach described in Section 2.3. In tICA, they are determined using the marginal posterior variance at each location to perform a t-test, followed by multiple comparisons correction using Bonferroni correction to control the FWER. An engagement threshold of γ = 0 and a significance level of α = 0.01 was used for each procedure. While both stICA and tICA both provide stringent false positive control, the areas of engagement based on stICA are larger than those based on tICA due to the smaller variances shown in the second row and the incorporation of spatial dependencies. maps could be spuriously inflated by shrinkage to the template mean. To quantify the reliability of the subject effect estimates, we compute the Pearson correlation between the maps based on each session of data. To quantify the reliability of the areas of positive and negative deviation, we compute the overlap between the areas identified using each session of data. We Figure 14. Estimates and areas of deviation. Deviations represent differences between the subject IC and the template mean. Areas of deviation are determined in stICA through the joint posterior distribution of the deviations, based on the excursions set approach described in Section 2.3. In tICA, they are determined using the marginal posterior variance of the deviations at each location to perform a t-test, followed by multiple comparisons correction using Bonferroni correction to control the FWER. For each procedure, a threshold of γ = 0 and a significance level of α = 0.01 was used. The areas of deviation identified using stICA are larger and include more subtle deviations, whereas tICA only has sufficient power to identify the most intense deviations seen in yellow (positive) and turquoise (negative).
consider two measures of overlap: the size (number of vertices) and the Dice coefficient, defined as the size of overlap divided by the average size of the two areas individually. Figures 13 and 14 shows results using stICA, tICA and dual regression for one IC representing an attention network. Figure 13 displays estimates, marginal standard deviations (SD) and areas of engagement for the ICs. Figure 14 displays estimates and areas of deviation for the subject effects. In both figures, the estimates produced using stICA are noticeably smoother than those produced using tICA. Dual regression does not produce deviations or variance estimates, so the only mode of comparison is the IC estimates themselves, which appear noisier than those produced using either tICA or stICA. Comparing the areas of engagement and deviation, those based on stICA appear larger and include more subtle deviations, whereas tICA is able to identify the most intense deviations seen in yellow (positive) and turquoise (negative). As both stICA and tICA provide similar control over false positives, the difference in power is due to both the smaller marginal variance estimates, shown in the second row of Figure 13, and accounting for spatial dependencies. Figures 15 and 16 show measures of the reliability of subject effects produced using stICA and tICA. Figure 15 displays the correlation across sessions of the subject effect estimates produced by stICA and tICA, by scan duration. Each point represents one of the 16 ICs, and boxplots display the distribution over all ICs. For shorter scans, stICA shows substantially more reliable subject effect estimates than tICA. As the scan duration increases, the methods begin to converge, with stICA still showing a slight improvement over tICA at T = 1200. This illustrates that the use of spatial priors improve accuracy of estimates the most when sample size is limited. If more data is available, template ICA without spatial priors may be sufficient to produce reliable estimates of subject effects. Figure 16 displays the overlap across sessions of the areas of positive and negative deviation produced by tICA and stICA, by scan duration. Each point represents a single IC and direction of deviation (positive or negative), and boxplots represent the distribution over all ICs in both directions. Figure 16(a) illustrates that stICA identifies substantially larger reliable areas of positive and negative deviation, regardless of scan duration. This suggests that stICA has more power to detect true deviations than tICA, as observed in the simulation study. Considering Figure 16(b), first note that the denominator of the Dice coefficient is the average size of the two areas being considered; hence, it becomes more difficult to achieve high Dice overlap as areas become larger to include more subtle deviations. Yet for short scan duration, stICA achieves greater Dice overlap, even Figure 15. Scan-rescan reliability of deviation estimates. We assess the reliability of the estimates of deviations. Deviations or subject effects represent differences between the subject-level ICs and the group average ICs. We consider the effect of sample size by varying scan duration from T = 400 volumes (5 min) to T = 1200 volumes (15 min). Each point represents one of the 16 ICs, and boxplots represent the distribution over all ICs. For shorter scans, stICA shows substantially more reliable deviation estimates than tICA. As the scan duration increases, the methods begin to converge, with stICA showing a slight improvement over tICA. This illustrates that the spatial priors improve accuracy of estimates the most when sample size is limited. If more data is available, tICA may be sufficient to produce reliable estimates of subject effects.
while identifying much larger areas of deviation than tICA. For longer scan duration, the two methods appear to converge in terms of Dice overlap, although this is a greater achievement for stICA since the areas of deviation are larger as seen in Figure 16(a).

Discussion
In this article, we have proposed a spatial template independent component analysis (stICA) model for subject-level brain network estimation with fMRI data. Estimation can be performed through a computationally efficient expectation-maximization (EM) algorithm. Joint posterior inference is possible using an efficient excursions set approach, which avoids multiple comparisons and has greater power to detect areas of engagement and deviation. We have validated stICA through both simulation studies and a real fMRI data analysis.
The proposed stICA approach has several benefits. First, accounting for spatial dependencies results in greater estimation efficiency and power to detect true effects, as demonstrated in the simulation studies and fMRI data analysis. Second, stICA can be used to directly investigate "subject effects, " which represent differences between a subject brain network and the population average. These subject effects may be used to investigate any number of phenomena, including effects of disease progression and treatment, typical and atypical development, aging and cognitive decline, and changes in brain state. Such investigation of subject effects is not possible with commonly used ad-hoc approaches like dual regression. Third, stICA can produce thresholded maps in a principled manner, which is often of interest in practice. Finally, the proposed EM algorithm is quite computationally feasible.
In our simulation studies and real data analysis, we have compared the performance of stICA with two benchmark approaches: standard template ICA (tICA) without spatial priors and dual regression. Both stICA and tICA clearly outperform dual regression, but the difference in performance between standard tICA and stICA is more nuanced. In the simulation studies and fMRI analysis with shorter scan duration (approximately 5 min), stICA outperforms tICA in all measures. But in the fMRI analysis with longer scan duration (approximately 15 min), stICA and tICA show somewhat similar performance in terms of producing reliable subject effects and areas of deviation, though stICA still exhibits greater power to detect effects. This suggests that the use of spatial priors may be most helpful when data quantity is limited, while tICA may be sufficient if more data is available. This is convenient, since the computational demands of stICA grow with scan duration and may become very burdensome for very long scans, while tICA is quite fast even for large datasets. If, however, one wants to maximize power to detect true effects, stICA should always be used, since by accounting for spatial dependencies it has greater power to detect engagements and deviations, regardless of scan duration.
Several limitations and future directions should also be noted. First, the proposed EM algorithm is an empirical Bayes approach and does not account for uncertainty in the unknown model parameters. This could result in underestimation of the posterior variances and lead to inflated false positive rates when identifying areas of engagement or deviation. While we do not observe any evidence of this in our simulation studies, it could be more likely to occur in challenging settings where parameter estimates may be less accurate. A possible future topic of research is to develop a fully Bayesian approach with priors on the model parameters to better account for uncertainty. Second, our model assumes stationarity of the spatial fields, which may not be strictly satisfied. In volumetric fMRI data, spatial fields representing activation or engagement are known to exhibit nonstationary features due to cortical folding, the presence of multiple tissue classes, and spatially varying amplitudes of activation or engagement. However, our model is applied to the cortical surface, where the assumption of stationarity is more appropriate. Some nonstationarity may persist, as "background" areas tend to be more smooth than areas of engagement. This can lead to slight oversmoothing of areas of engagement and attenuation of peak signals. Our model could be improved by allowing the spatial dependence parameter κ to depend on location (Lindgren, Rue, and Lindström 2011). However, this would come at a cost of computational and statistical efficiency, in part because it would be necessary to relax the common dependence assumption described in Section 2.2.4.
Third, here we apply the stICA model to data on the cortical surface; we do not analyze subcortical and cerebellar volumetric gray matter, though they may be of interest in some applications. The stICA model is applicable in theory to an arbitrary number of brain regions including cerebellar and subcortical regions. As described in Section 2.1.1, it is possible to generate a tetrahedral mesh to represent volumetric structures such as the cerebellum Figure 16. Reliability of areas of deviation. We assess the scan-rescan reliability of the areas of positive and negative deviation identified with stICA and tICA. Deviations or subject effects represent differences between the subject-level ICs and the group average ICs. Reliability of the areas of deviation across the two sessions is quantified in two ways: the number of overlapping vertices (a), and the Dice coefficient of the overlap, which ranges from 0 (no overlap) to 1 (perfect overlap) (b). Each point represents the positive or negative deviations associated with one of the 16 ICs, and boxplots represent the distribution over all ICs in both directions. We consider the effect of sample size by varying scan duration from T = 400 volumes (5 min) to T = 1200 volumes (15 min). As seen in panel (a), stICA identifies substantially larger overlapping areas of positive and negative deviation, regardless of scan duration. This reflects the fact that stICA has more power to detect true deviations than tICA. Considering panel (b), first note that the denominator of the Dice coefficient is the average size of the two areas being considered; hence, it becomes more difficult to achieve high Dice overlap as areas become larger to include more subtle deviations. Yet for short scan duration, stICA achieves greater Dice overlap, even while the areas of deviation are much larger than those identified with tICA. For longer scan duration, the two methods appear to converge in terms of Dice overlap, although this is a greater achievement for stICA since the areas being identified are larger (a).
or thalamus. However, in practice computational considerations may present limitations: the number of vertices required to accurately represent the larger subcortical structures as a tetrahedral mesh is approximately equal to the number of voxels, which is over 10,000 for the largest regions at 2 mm isotropic resolution. More importantly, in a tetrahedral mesh the number of edges is typically much larger than in a triangular one with the same number of vertices, reducing the level of sparsity in the spatial correlation matrix and increasing the computational burden of model estimation. Excluding additional boundary vertices would reduce the computational demands, particularly because marginalizing out these locations substantially reduces the sparsity of the spatial correlation matrix. We found that excluding these vertices had a relatively minor detrimental effect on our results, but for small regions where the data boundary represents a larger proportion of the total area the effect may be more serious. An alternative is to stratify model estimation across regions or to focus analysis only on specific regions of interest. A true whole-brain analysis including both cortical surfaces as well as subcortical locations, with additional boundary vertices for all regions to improve performance, remains an important topic of future computational research. One promising option for inclusion of volumetric regions is to employ the approach of Sidén et al. (2021), replacing the exact Choleskybased computational methods with approximate, fast iterative methods.
Finally, our focus in this work is the estimation of functional topology via the spatial component of the ICA model. However, of equal scientific interest and importance are the temporal activation patterns associated with different ICs, since these form the basis of "functional connectivity", that is, the correlation or similarity of temporal activity between ICs. Since accurate estimation of the ICs themselves is a necessary prerequisite for accurate estimation of their corresponding temporal activity, this work advances the use of ICA for estimation of functional connectivity. However, a more explicit focus on the temporal aspect of the ICA model may lead to further improvements in accuracy. In the future, we plan to develop a template ICA framework incorporating empirical population priors on the covariance of the temporal mixing matrix M, which will more fully leverage population information to provide maximally accurate ICA-based estimates of functional connectivity.

Supplementary Materials
The supplementary materials include detailed derivations, more details and sensitivity analyses around the computational approach, and additional data analysis results.