Permutation-based inference for function-on-scalar regression with an application in PET brain imaging

The density of various proteins throughout the human brain can be studied through the use of positron emission tomography (PET) imaging. We report here on data from a study of serotonin transporter (5-HTT) binding. While PET imaging data analysis is most commonly performed on data that are aggregated into several discrete a priori regions of interest, in this study, primary interest is on measures of 5-HTT binding potential that are made at many locations along a continuous anatomically defined tract, one that was chosen to follow serotonergic axons. Our goal is to characterise the binding patterns along this tract and also to determine how such patterns differ between control subjects and depressed patients. Due to the nature of our data, we utilise function-on-scalar regression modelling to make optimal use of our data. Inference on both main effects (position along the tract; diagnostic group) and their interactions are made using permutation testing strategies that do not require distributional assumptions. Also, to investigate the question of homogeneity we implement a permutation testing strategy, which adapts a ‘block bootstrapping’ approach from time series analysis to the functional data setting.


Introduction
The neurotransmitter serotonin (5-hydroxytryptamine; 5-HT) has been implicated as playing an important role in the pathophysiology of depression (Owens and Nemeroff 1994;Meyer 2007), as well as in bipolar disorder, anxiety disorder, and other mental illnesses.The most commonly prescribed treatment for many of these is the class of selective serotonin reuptake inhibitors (SSRI) that preferentially bind to the serotonin transporter (5-HTT), a protein whose function is to remove serotonin from the synapse as it is absorbed back into the neuronal cell body.
Through positron emission tomography (PET) imaging, it is possible to measure the relative density of various proteins throughout the brain.This is done by injecting a radioligand, a radioactive molecule that has affinity for the protein of interest, into the subject's bloodstream.This allows measurement of the concentration of the radioligand at each location in the brain over time.By fitting statistical models that describe the kinetic behaviour to the observed measured concentrations over time, it is possible to acquire estimates of various measures, such as binding potential, that are related to the density of the particular protein of interest.This binding potential can vary considerably throughout the brain, so this modelling is typically done separately for each location of interest.Common practice for neuroreceptor mapping studies using PET involves prespecification of some number of anatomically defined regions of interest (ROI) followed by model fitting to obtain binding potential estimates for each ROI and for each subject.
The literature on PET measurements of 5-HTT binding in depression contains disparate results.Some studies have reported that binding was lower for depressed patients compared to normal controls (Malison et al. 1998;Parsey et al. 2006), while others have found higher binding in depressed patients (Reivich et al. 2004;Cannon et al. 2007).While some of the apparent disparity may be due to choice of radiotracer or modelling techniques, we note that the selection of ROIs for these analyses was not primarily based on the anatomy of the serotonergic system.Taking that into account, it may be informative to focus attention on a continuous tract, following axons that are part of the serotonergic pathway, going from the brain stem to the frontal pole.
Each subject's estimated binding potential at each location along the tract can be regarded as a curve varying over a continuum, which suggests the potential utility of functional data analysis.We denote the measured binding potentials as the functional responses y i (t), i = 1, . . . ,n; the distance along the tract as t ∈ T for some finite interval T ⊂ R which we set to be T = [0, 1], the domain over which the functional data are observed.Each individual's profile may depend on the diagnostic group (i.e.patient or control subject), age, sex, and other factors.Collecting these subject-level variables x ij , j = 1, . . ., p, we consider a general function-on-scalar linear regression model: where β j (t) are the coefficient functions and 1 (•), . . ., n (•) are iid realizations of a stochastic process with expectation 0 and covariance structure S(s, t) = cov( i (s), i (t)), usually taken to be unknown (Ramsay and Silverman 2005;Reiss et al. 2010;Morris 2015).
It is typical to fit models such as (1) by expressing both observed curves and functional coefficients in terms of some basis function representation (often, spline functions) and to impose roughness penalties on the estimated coefficient functions to avoid overfitting.Reiss et al. (2010) cast this estimator as a penalised generalised least squares problem and proposed selection of the tuning parameter either by generalised cross-validation (GCV) or by restricted maximum likelihood (REML).Alternatives to this fitting procedure include (Shi et al. 2007;Krafty et al. 2008), among many others (see Morris 2015 for a review).
As just described, a vast literature exists on fitting and providing point estimates for function-on-scalar models, but comparatively little attention has been devoted to inference on these models.Pointwise hypothesis testing of the effect of a set of predictors can be done with a t test (Holmes et al. 1996;Cox and Lee 2008) or F test (Ramsay and Silverman 2005;Reiss et al. 2010) framework, and permutation testing may be used to account for the correlation of the tests and/or to adjust for the issue of multiple testing.While this general approach is intuitive and can provide some indication of where any difference might be, such a sequence of pointwise tests may not be as powerful as a single global test could be.Also, such an approach does not lend itself easily to providing simultaneous confidence bands.In such a framework, a global test should properly combine the test statistics at all t's to obtain an overall conclusion.
Much work on hypothesis testing for the comparison of mean curves or for determining the effect of a covariate has been done within the framework of Model (1).Here, we will first discuss the simplest case, comparison of mean curves, and subsequently we will describe testing for covariate effects more generally.
In the one-way functional ANOVA setting, the model for K groups is: and interest is in testing Fan and Lin (1998) proposed a two-sample test (K = 2) and adaptive high-dimensional ANOVA (HANOVA) (for arbitrary K) in this scenario.Further work was done by Cuevas et al. (2004), who proposed a Monte Carlo procedure to approximate the asymptotic distribution of the test statistic under the null hypothesis.Zhang and Liang (2014) later proposed to globalise the pointwise F-test.We classify such testing strategies as 'parametric' in that the procedure is based on (asymptotic) distributions of the test statistics (and thus does not require resampling).While the approaches just described are generally convenient and computationally efficient, an alternative approach is to tackle the functional ANOVA problem 'nonparametrically,' i.e. using permutation or bootstrapping procedures.While such procedures involve considerably more computational effort, they are often favoured in situations in which it is difficult to derive asymptotic distributions of the test statistic or when the required assumptions of parametric procedures are thought not to hold.Maldonado et al. (2002) proposed three permutation statistics: pooled-mean, pooled-variance and the ratio between the two.A similar procedure is discussed in Sturino et al. (2010), based on the summary of curves.Alternatively, Sirski (2012) proposed a permutation test based on pairwise comparisons between curves.Corain et al. (2014) provided new insights into some of the works mentioned above and recommended the use of permutation and combination-based approaches, which involve testing sub-hypotheses at each time point and then combining them to test the global null hypothesis.Zhang et al. (2019) conducted a new global test for one-way ANOVA that utilises the nonparametric bootstrap to approximate the null distribution and to obtain an approximate critical value.
Generalising the functional ANOVA model in (2), it is often of interest to consider the effect of covariates and interactions on functional outcomes of Model (1).Some parametric approaches to this include (Shen and Faraway 2004), which used an F-type test to compare two nested linear models and also evaluate the significance of individual covariates, e.g.
for some j ∈ {1, . . ., p}.Other methods considered a general linear hypothesis testing problem: where C ∈ R q×(p+1) is a matrix that specifies the linear combination of the coefficient functions that is of interest, q is the number of joint hypotheses, β β β(t) = (β 0 (t), . . ., β p (t)) , and c c c(t) = (c 1 (t), . . ., c q (t)) .Zhang and Chen (2007) and Zhang (2011) discussed hypothesis testing under this framework, using an L 2 -norm-based test and F-type test, respectively.As an alternative, Cardot et al. (2007) took a nonparametric approach to estimate the covariate effect on the response curves and tested the significance of a single covariate on the functional response through a permutation procedure.Two test statistics based on an adaptation of F-statistic and smoothing residuals were considered.Faraway (1997) presented a different L 2 -norm-based test and performed inference using a bootstrap technique.Reiss et al. (2010) extended the pointwise F-statistic F(t) to be tested on all t simultaneously, in a way that accounts for the multiple tests using a permutation-based method.
In addressing hypothesis testing problems such as whether mean functions or functional parameters are zero functions or not, the answer can be easily obtained by examining the simultaneous confidence bands (SCB).In addition to being a powerful visualisation tool, SCB highlights local features.Extensive studies have been done on SCB estimation for the mean function, a special case of (1), including (Johansen and Johnstone 1990;Eubank and Speckman 1993;Sun and Loader 1994;Wang and Yang 2009).Some of them involved bootstrapping (Hall and Titterington 1988;Faraway 1997;Neumann and Polzehl 1998;Cuevas et al. 2006).Wang et al. (2016) provided a good overview of SCB for mean and covariance functions.Other recent work includes that of Degras (2011) and Degras (2017), who built an SCB for a regression coefficient function that relies on the functional central limit theorem (FCLT).In the case of a small sample size or non-normal errors, a bootstrap algorithm is proposed as an alternative.Bunea et al. (2011) proposed a fully data-driven method based on thresholding least squares estimations, along with adaptive confidence bands.Cao et al. (2012) proposed a polynomial splines estimator for the mean function and asymptotically correct SCB.More generally, Chang et al. (2017) proposed a wild bootstrap method for estimating SCBs for the coefficient function in the functional linear regression model.
There are various ways to fit the model in (1), and in this paper we take the penalised least squares approach, with a cubic B-spline basis and tuning parameter selected by GCV, as implemented in the fosr() function in the refund package (Crainiceanu et al. 2012).The overall objective of the present study is to characterise the 5-HTT binding along a particular continuous tract through the brain, going from the brainstem to the frontal pole, following the axons of the serotonergic neurons.We aim to investigate the binding patterns continuously along that tract, and also to determine whether such patterns differ between patients with major depressive disorder (MDD) and normal control subjects.
In PET image analysis, it is common to focus attention on some pre-determined anatomically defined regions of interest (ROI), typically convex.An implicit assumption of such approaches is that the binding is homogeneous throughout each region, although this condition is rarely challenged.In contrast to most previous studies, the ROI defined for this study is unusually shaped as it is long, narrow, and curved.Thus, we regard the observations as being one-dimensional functional data along a continuous tract.
In addition to the unique application to PET data observed continuously along a tract, this paper examines the effect of removal of covariate effects for permutation testing in the general framework of function-on-scalar regression, and it also introduces a new circular permutation method, adapted from time series analysis, to the functional data setting.The rest of this article is organised as follows.In Section 2, we describe the motivating data.The details of our methodologies are in Section 3, with Section 3.1 describing the group difference test and interaction test on the model without covariates.Section 3.2 describes the group difference test and interaction test on the model with covariates.Section 3.3 describes the homogeneity test.We conduct simulation studies that resemble our motivating dataset, examining the validity of our methods and comparing the performance of competing methods and presenting the results in Section 4.An application of our methods on the motivating data is given in Section 5. Finally, we present concluding remarks in Section 6.

Data
The available dataset for this project is taken from PET scans from 91 subjects, including 32 normal controls and 59 patients with major depressive disorder (MDD).Each subject was imaged with the [ 11 C]DASB radiotracer, which has affinity for the serotonin transporter.The tract under investigation was selected to contain the serotonergic cell body projections (Bartlett et al. 2022).Primary interest of this analysis lies in determining patterns of binding of the serotonin transporter along this tract, and determining whether these patterns are different between the diagnostic groups.
At each point along the tract, we measured BP p , defined to be the ratio at equilibrium of specifically bound radioligand to that of total parent radioligand in plasma.We have observations at each of 56 unequally spaced points per function along the tract, where the domain is rescaled to [0, 1].
Details of data collection and modelling of the PET imaging data can be found in Ogden et al. (2007).Figure 1 shows the observed binding potential as a function of distance along a tract that follows serotonergic axons for a random subset of subjects.

Model without covariates
We begin with the simplest model: where x i1 in our application is the group indicator variable.We first examine the group effect Under the null hypothesis in (7), the model reduces simply to In this case, β 1 (t) represents the mean group difference at the point t along the tract.To test for group effect, we will apply a simple permutation method (permuting group labels), as previously considered by Maldonado et al. (2002), Reiss et al. (2010), Sturino et al. (2010), If the group effect is present, we next want to test whether the effect is a 'pure' main effect (which would be constant over t) or an interaction effect (meaning a group effect that varies with t).To test for such an interaction, our null hypothesis would specify that the only difference between groups is constant over t: Under the null hypothesis in (9), the model becomes where the (scalar-valued) parameter η represents the (vertical) difference between the group mean curves for all t.η is then estimated through fitting Model (10), using the pffr() function in the refund package (Crainiceanu et al. 2012).Given an estimate η of η, we can approximately 'remove' the main effect by If the estimate η is close to η, then This interaction testing problem is thus converted into one that is similar to the test for group difference.Given an estimate of η, we permute the group labels and perform the same algorithm previously described on all y * i (t)'s.

Model with covariates
When other covariates are present, it is necessary to account for their effects.However, the permutation testing problems are not as straightforward.In this section, we describe a method to account for the covariate effects all at once in order to test for group differences and interaction.Consider the model: where x i2 , . . ., x ip are covariates that we wish to account for.Through fitting Model (13), estimates β2 , . . ., βp , which are the effects of x i2 , . . ., x ip , can be approximately removed: The same testing problems in the previous section can now be carried out on y * i (t)'s.

Homogeneity test
In addition to testing for group difference and interaction effect, it may be of interest to test whether coefficient functions are constant across t.Considering the model in ( 13), we are interested in testing Under the hypothesis in (15), the expected value of each subject is constant along the tract.
To test this hypothesis, we employ the 'block bootstrapping' approach, a resampling method for building a new curve from randomly sampled blocks of the original curve, adapted here from the field of time series analysis to the functional data setting.The central idea is that if a function is constant across all t, then any rearrangement of the function would also be constant.
This notion of sampling blocks of time series was first put forth by Hall (1985).Carlstein (1986) proposed the non-overlapping block bootstrap (NBB), while Künsch (1989) and Liu and Singh (1992) independently proposed overlapping blocks, known as moving block bootstrap (MBB).Politis and Romano (1992) introduced a circular block bootstrap (CBB) to address the issue of boundary effects that were present in MBB.Under the necessary assumption of stationarity (constant mean and variance; covariance between any two observations depends on how far apart they are), the block bootstrapping technique preserves the underlying dependence structure asymptotically.
In our adaptation of this method to function-on-scalar regression, the order and characteristics of the curves are preserved within each block and the error functions are assumed to be independent between subjects.Prior to conducting this test and for this test only, for simplicity, each observed curve is smoothed on an equally spaced grid using cubic smoothing splines.Our algorithm, which we term 'circular permutation procedure,' of shifting blocks in the function-on-scalar setting is described below, and a visualisation of the algorithm is shown in Figure 2: Algorithm: Homogeneity test 1.Fit Model (1) to the observed data 2. Calculate where βj = 1 0 βj (t) dt, j = 0, 1, . . ., p.

Validity of group difference test
Next, we conduct simulation studies for the group difference test on Model (6) under different scenarios for testing the hypothesis in (7).These scenarios are combinations of the following options: (1) β 2 (t) and β 3 (t) are either both zero, or not; (2) covariate effects related to β 2 (t) and β 3 (t) are either removed (in the sense of ( 14)), or neglected; and (3) the covariates x i1 , x i2 , x i3 are either mutually independent, or not.Table 1 lays out the various scenarios considered; the group difference methodology is described in Section 3.1 and the methodology of removing covariate effects is described in Section 3.2.Coefficient estimation is done using the pffr() function in the refund package (Crainiceanu et al. 2012).
For each scenario, we construct 1000 datasets for each of three sample sizes: N = 10, N = 50, and N = 100 random curves, generated from where i (t) ∼ N(0, 1) is smoothed as in Section 4.1.In the case of independent covariates, x i2 and x i3 are derived independently from N(0, 1).In the case of dependent covariates, x i2 = x i1 + δ 1 and x i3 = x i2 + 2 * δ 2 , where δ 1 , δ 2 ∼ N(0, 1) and the correlations fall roughly in (0.3, 0.5).The nonzero coefficient function is shown in the middle panel of Figure 7. Results of 1000 simulations are shown in Table 1; full histograms of the p-values are provided in Figure 8 in the Supplemental Material.
We first look at the results when covariates are independent.In column 1 (effects removed), as N increases, the performances become better in that the sizes are closer to 0.05.This is expected as larger sample sizes better estimate the covariate effects and thus are more effective at 'removing' those effects in the permuted data.In column 2 (effects neglected), the sizes are all close to 0.05, which is expected as the independent covariates are not associated in any way with the group difference along the tract.
Next, we look at the results when covariates are dependent.In column 3 (effects removed), although the sizes decrease for larger N, the results are still quite anticonservative even for N = 100.Under this scenario, we also considered N = 500 and N = 1000 when betas are zero and nonzero.For N = 500, the sizes are 0.085 and 0.101 for betas zero and nonzero, respectively.For N = 1000, the sizes are 0.094 and 0.105 for betas zero and nonzero, respectively.The p-values are decreasing with increasing N, but at a slow rate.Q-Q plots of the p-values for N = 500, 1000 are provided in Figure 9 in the Supplemental Material.In column 4 (effects neglected), when betas are zero, the sizes are close to 0.05, as would be expected.In fact, this mirrors the case in which independent covariate effects are neglected.This is because when effects are neglected, it doesn't matter whether the covariates are independent or not.When betas are nonzero, however, the performance when neglecting covariates is affected dramatically by sample size.In column 5 (truth), the true coefficient effects are used instead of the estimated values and the sizes are all close to 0.05.
From this simulation, we can conclude that in the case of independent covariates, what we propose in Section 3.2 is appropriate for reasonably large N.For small or moderate N, however, removing effects is not helpful.In that case, it is recommended that effects not be removed.When there is strong dependence among the covariates, removing those effects provide tests that are generally anti-conservative.According to our simulation results, although the null distribution of the p-values seems to be converging to the uniform distribution as N increases, the convergence is slow.Thus for small and even moderate sample sizes, the p-values cannot be taken at face value.With this knowledge, however, it is possible to attempt to calibrate p-values in a manner such as the Monte Carlo approach proposed by Hall and York (2001) (this approach has also been taken by Henderson et al. 2008;Candelon and Metiu 2013, among others).Naturally, such a requirement constitutes a significant limitation of this testing procedure, yet to our knowledge, there are no existing published alternatives.On the other hand, if such effects are neglected, the result is a test with doubtful validity.

Comparison of circular permutation procedure to label permuting procedure
In this section, we conduct simulation studies to compare the performance of the circular permutation procedure to the label permuting procedure.Namely, under Model (16), we wish to test hypothesis (15) with p = 3.The methodology is as described in Section 3.3.We construct 1000 datasets for each of three sample sizes: N = 10, N = 50, and N = 100 random curves.The curves are generated from ( 16), where x i2 and x i3 are derived independently from N(0, 1), and i (t) is generated under low correlation or high correlation.In either case, i (t) is smoothed using cubic smoothing splines with smoothing parameter spar = 0.7 (corresponds to tuning parameter λ ≈ 0.00062), as implemented in the smooth.spline()function in the stats package (R Core Team 2013).β 0 (t), . . ., β 3 (t) are constant functions, as shown in the right panel of Figure 7.
With 1000 permutations, the sizes for testing the hypothesis of constant β 0 (t), . . ., β 3 (t) in ( 15) are shown in Table 2; full histograms of the p-values of the circular permutation procedure and the label permutation procedure are provided in Figures 10 and 11, respectively, in the Supplemental Material.It is apparent that the circular permutation procedure performs better than the label permutation procedure.The estimated sizes for the circular permutation procedure are relatively close to the nominal size but the label permutation procedure gives large sizes.In the previous sections, we see that the label permutation procedure performs well when it comes to testing for a group or interaction effect, but that is not the case here.The reason is that permuting all labels simultaneously is actually testing if any of the covariates have an effect on the curves, which is not specific to the homogeneity of the coefficient functions.

Application
We are interested in investigating the difference in binding patterns between groups and characterising any such effect.We first consider the simplest model, with just a diagnosis effect, then the full model, allowing for other covariates.Furthermore, we wish to assess the homogeneity of the binding patterns throughout each region.

Group effect, model without covariates
Considering Model (6) with only an intercept term and a group term, we begin with testing for group effect, i.e. the hypothesis in (7).The algorithm is laid out in Section 3.1, where group labels are permuted.The results, as shown in the top row of Figure 4, indicates that there is a group effect, and thus we conclude that the mean SERT binding pattern is different in at least some locations along the tract between groups (p = 0.001).
Next, allowing for a group 'main' effect, we test for an interaction effect, i.e. the hypothesis in (9).The algorithm is laid out in the same section, where the vertical difference between the two mean curves, η, is estimated then subtracted, followed by permuting group labels of the resulting data.The results, as shown in the bottom row of Figure 4, suggest that there is evidence for an interaction effect between the diagnostic group and location along the tract (p < 0.001).

Group effect, model with covariates
Next, we present results under the full model in (13) which includes two covariates in addition to the intercept and the group difference function.Specifically, x i1 = diagnostic group (0 for normal control, 1 for MDD patient), x i2 = age (in years) and x i3 = sex (0 for female, 1 for male).The estimated coefficient functions β 0 , β 1 , β 2 , β 3 corresponding to each of the predictors are shown in Figure 5.The effect of diagnostic group, comparing MDD patients to normal controls, is characterised by β 1 (t), the effect of increasing age by one year is characterised by β 2 (t), and the effect of sex, comparing males to females, is characterised by β 3 (t).Before testing the group effect, we first evaluate the effects of each covariate.We first consider a model in which age is the only predictor and then one in which sex is the only predictor, and test for the significance of these effects using the label permuting strategies discussed in Section 3.1.The results shown in the top row of Figure 12 in Supplemental Material, suggest that, at least in these simple models, neither the effects of age (p = 0.170) nor sex (p = 0.295) are strongly indicated.Furthermore, the correlation between the covariates are corr(x 1 , x 2 ) = 0.28, corr(x 1 , x 3 ) = 0.006, and corr(x 2 , x 3 ) = 0.0009 suggesting that the situation of our data resembles the scenario of dependent covariates in the simulation study in Section 4.2.Our simulation results suggest that the model with covariates and their effects removed may be appropriate for testing the group effect as outlined in Section 3.2.
We consider the model with covariates and remove their effects when performing the testing for group effect as outlined in Section 3.2.The permutation test first removes covariate effects of age and sex (resulting in ( 14)), then permutes the group labels using the algorithm described in Section 3.1.The results, as shown in the bottom row (left panel) of Figure 12 in Supplemental Material, suggest that the mean SERT binding pattern is different in at least some locations along the tract between patients with MDD and normal control subjects (p = 0.001).As mentioned earlier in Section 4.2, due to the anticonservative nature of this testing procedure, some calibration is needed to correct the p-values.Following the general approach laid out by Hall and York (2001), we reference our calculated p-value with the empirical null distribution from our Monte Carlo simulation study.This calibration results in a p-value of 0.003 for the group test.Lastly, we test for an interaction effect, i.e. does the group effect differ along the tract.The results shown in the bottom row (right panel) of Figure 12, suggest that there is evidence of an interaction effect between the diagnostic group and location along the tract (p < 0.001).Both conclusions agree with those under the simplest model in Section 5.1.

Homogeneity of effects
Finally, we test Model (13) for homogeneity of the log transformed subject curves, i.e. the hypothesis in (15) with p = 3.The algorithm is laid out in Section (3.3) and Figure 13 in the Supplemental Material shows the log transformed observed binding potential for a random subset of subjects.The results shown in Figure 6, indicate that binding of the subjects along the tract is non-uniform (p ≤ 0.001), adjusting for the diagnostic group, age and sex effects, i.e. at least one of the β j functions is non-constant.A natural followup question would be: which of the four coefficient functions is/are constant?Judging by Figure 5, it seems reasonable to test homogeneity on Model (6), since β 2 and β 3 are fairly flat across t.It is clear that β 0 (t) is non-constant, so it remains to test the constancy of β 1 (t), which is equivalent to the interaction test described in Section 3.1 and the results for this test was previously presented in Section 5.1.

Conclusions
In summary, the covariate effects (age and sex) are not apparent along the tract, thus a simple model with no covariates is likely to be sufficient for describing the data.By investigating the binding patterns continuously, a group effect is clearly present, signifying the mean SERT binding pattern between MDD patients and normal controls is different in at least some locations along the tract.Furthermore, the results of the test for interaction with distance along the tract suggest that such a group effect varies by tract location.Moreover, the heterogeneous binding is most evident in β 0 (t), which represents the mean SERT binding pattern of normal controls.Our findings are further supported by Bartlett et al. (2022).

Discussion
One challenge we faced was the slow convergence of the p-values to the uniform distribution as N increases for the scenario of dependent covariates and their effects removed.The Q-Q plots in Figure 9 show that the distribution is close to uniform only in the middle section.However, when true coefficient values are used, the sizes are close to 0.05.Due to this slow convergence, p-value calibration is necessary for any real data analysis.
There are several aspects of our procedure that would be interesting to consider further.Firstly, we take the penalised least squares approach with cubic B-spline basis to fit the Model (1) due to its computational efficiency and retention of within-function dependence.Alternatives to spline-type basis functions include kernel and local polynomial smoothers and wavelets.
Secondly, when fitting the function-on-scalar regression models ( 6), ( 16), in our implementation the tuning parameter is chosen using GCV separately for the original data and then again for each permutation.Both in our simulation studies and in our application to the imaging data, the selected tuning parameter values are quite homogeneous, so some computational time could be saved by applying the tuning parameter selected for the original data to all permuted datasets.It takes about 4.5 min on a MacBook Pro with 1.4 GHz CPU to run one test with 1000 permutations for each of the scenarios.
Thirdly, individual testing of constancy of each covariate can be done, by subtracting out other covariate effects and using permutation methods that are discussed in Section 3.2.Removing covariate effects, however, will depend on the correlation among the covariates and a reasonably large N would be needed to ensure valid inference.
Lastly, alternative to our proposed circular permutation procedure, we could consider other block bootstrapping techniques.The pseudo time series generated from MBB is not stationary, even if the original series is.To address this, Politis and Romano (1994) proposed a stationary bootstrap (SB) method that has random block lengths, where the lengths follow a geometric distribution and showed that the resulting pseudo time series is also stationary.Although earlier studies on block bootstrap have established asymptotic properties of the mean and covariance, later works attempted to address the issue of 'discontinuities' that arise at block boundaries, which could disrupt the stationarity assumption.For instance, Paparoditis and Politis (2001) proposed the tapered block bootstrap (TBB) which downweights the observations at the endpoints to reduce the influence of discontinuities.This provides a better convergence rate in the bias and mean squared error than previous studies.An extension of this method, extended tapered block bootstrap, was proposed by Shao (2010) that widens the scope of applicability of TBB.Others have considered data smoothing within resampling that can improve bootstrap approximations (Gregory et al. 2015).

Figure 1 .
Figure 1.Original measurements: a random subset of observed (un-smoothed) binding potential curves, coloured by diagnostic group.

Figure 2 .
Figure 2. Illustration of circular permutation procedure: (Top row) Observed curves of three different subjects.(Middle row) A random value τ ∈ [0, 1] is selected separately for each curve to divide it into two blocks.(Bottom row) The curves are pasted back together after rotating the blocks.

Figure 3 .
Figure 3. (Left) Coefficient function β 1 (t) that generates Model (6) for power studies in Section 4.1.(Right) Power curves for detecting an interaction effect for different sample sizes N and effect size (vertical distance d).

Figure 4 .
Figure 4. (Top row) Results of group difference test: (Left) Spaghetti plot of β1 (t) from 1000 permutations, β1 (t) fitted from original data is shown in red.(Right) Histogram of 1000 permuted statistics, observed statistic is shown in purple.(Bottom row) Results of interaction test.

Table 1 .
Simulation results: from examining the removal of covariate effects, values shown are sizes under the null hypothesis.

Table 2 .
Simulation results: from comparing the circular permutation procedure to the label permutation procedure, values shown are sizes under the null hypothesis.