Testing with a nuisance parameter present only under the alternative: a score-based approach with application to segmented modelling

ABSTRACT We introduce a score-type statistic to test for a non-zero regression coefficient when the relevant term involves a nuisance parameter present only under the alternative. Despite the non-regularity and complexity of the problem and unlike the previous approaches, the proposed test statistic does not require the nuisance to be estimated. It is simple to implement by relying on the conventional distributions, such as Normal or t, and it justified in the setting of probabilistic coherence. We focus on testing for the existence of a breakpoint in segmented regression, and illustrate the methodology with an analysis on data of DNA copy number aberrations and gene expression profiles from 97 breast cancer patients; moreover some simulations reveal that the proposed test is more powerful than its competitors previously discussed in literature.


Introduction
Segmented or piecewise regression is a useful regression tool in several areas, such as Ecology, Biology, and Epidemiology. [1][2][3][4] The model assumes the covariate acts on the response via two or more straight lines joined at the threshold or breakpoint value in the covariate range. More generally segmented regression falls in the wider class of regression models in which the covariate effect is quantified by its regression parameter but also depends on another additional unknown parameter. Formally, if Y is the response variable related to the quantitative covariate x through the function ϕ(x, ψ) known up to a scalar parameter ψ ∈ [L, U], the resulting regression equation reads as where and z i is a vector of additional fixed covariates entering the model linearly through the parameter β. In the context of changepoint detection, there are several cases covered by model (1), as discontinuous changepoint ϕ(x i , ψ) = I(x i > ψ), linear segmented ϕ(x i , ψ) = (x i − ψ) + and quadratic segmented modelling ϕ(x i , ψ) = (x i − ψ) 2 + ; additional examples are harmonic regression with unknown frequency, ϕ(x i , ψ) = cos(ψx i + ω) where ω is the phase, and mixture of two treatment drugs ϕ(x 1i , x 2i , ψ) = log{ψx 1i + (1 − ψ)x 2i }.
In this paper we are interested in testing for H 0 : δ = 0 against H 1 : δ = 0. ψ is present only under the alternative since under the null hypothesis it vanishes, making the usual asymptotics inappropriate and the traditional tests far from being helpful; for instance empirical values for the wald and the likelihood ratio test statistics may be obtained, but these are useless as the reference null distributions are unknown even asymptotically. There are a lot of theoretical issues involved, see for instance, [5] and the most common approaches in literature lie on the maximally selected statistics framework: typically several unrestricted, that is, under H 1 , models are fitted at different values of the nuisance ψ and the largest statistic value is taken; however the relevant null distribution is unknown, and 'ad hoc' strategies have to be undertaken.
The most popular and leading contributions are due to Davies [6][7][8] which deals with such a problem using the theory of stochastic processes and provides an approximate upper bound for the p-value based on the expected number of upcrossings of a specified critical value. The Davies' approach is quite influential and many authors have used it in different contexts, see for instance Hansen [9] for an econometric perspective with simulation-based solutions, Kosorok and Song [10] for theoretical settings in survival analysis, and Zheng and Chen [11] for a review of genetic problems. A somewhat different approach comes from Conniffe [12] which presents a 'modified' score test where the unidentifiable nuisance parameter under the null is replaced by the corresponding estimate under the alternative. However, the resulting hybrid score statistic has still an unknown distribution and thus it does not provide a practical solution to the problem. We defer further comments until Section 2.2.
In the specific case of testing for a breakpoint in linear segmented regression, early theoretical works date back to [13][14][15] among others. More recently, permutations of the residuals have been used to obtain the null distribution and to compute the p-value accordingly. [16] This permutational approach can be clearly extended to any ϕ(·, ψ) even if the computational burden is not negligible and moreover it is not clear which residuals should be permuted for regression models more complex than (1).
However all the aforementioned approaches can result unsatisfactory from a practical standpoint: non-negligible computational burden and/or large sample approximations only. In this paper we discuss a simple and very intuitive approach based on a simple adjustment to the score statistic. The proposed approach needs quantities from the null fit and thus it does not require estimation under the alternative that can be awkward with noisy data and small samples. In this respect our approach is really in the spirit of the original score test, [17] unlike [12] which does need the estimate of the nuisance under the alternative.
The paper is structured as follows. In Section 2 we illustrate the methodology of proposed statistic, and Section 3 reports results from some simulation experiments to assess the finite sample properties of our proposal in comparison with its competitors; additional simulation results are reported in the Supplementary Material. Section 4 deals with real data analysis on gene expression profiles from 97 breast cancer patients, and finally Section 5 concludes with the discussion about possible extensions and generalizations.

Testing with an unidentified parameter under the null
To begin with, we assume a Gaussian distribution for the response with homoscedastic and independent errors, and let be the log-likelihood depending on the parameters (δ, θ T ) T . δ is the parameter of interest and θ involves the nonlinear nuisance parameter ψ and the remaining parameters β of the regression equation. As usual, the score vector is˙ = (˙ δ ,˙ T θ ) T with component relevant to the parameter of interest δ being˙ The variance of˙ is given by the expected information matrix I partitioned accordingly with blocks I δδ , I δθ , I θδ , and I θθ .
Under the true model it is clearly˙ ∼ N (0, I), therefore score-based inference appears feasible. Firstly suppose to be interested in testing H 0 : δ = δ 0 = 0, and let (δ 0 ,θ T δ 0 ) T be the maximum likelihood estimates under H 0 . Score inference on δ is based on the studentized statistic with corresponding variance I δ 0 |θ δ 0 , namely the conditional information I δδ − I δθ I −1 θθ I θδ evaluated at δ = δ 0 and θ =θ δ 0 . Thus if we wish to test for a non-zero δ 0 , the score statistic (3) can be used straightforwardly since asymptotically s δ 0 ∼ N (0, 1).
Hypothesis testing when δ 0 = 0 does not pose any specific difficulty, but its practical usefulness is moderate; for instance, in segmented regression it would mean to test for a known slope difference. To carry out a score test even for the more typical and important hypothesis H 0 : δ = 0, we plug in Equation (2) the fitted valuesμ 0i from the null fit, but the key point is how to deal with the term ϕ(x i , ψ) involving the nuisance parameter ψ which is not identified under H 0 . Our proposal relies in averaging it over [L, U]. This leads to replace the term ϕ(x i , ψ) with the average valueφ where We provide a justification for theφ i s in the next Section 2.1. Clearly theφ i s do not depend on ψ, and thus the score statistic can be computed even under H 0 : δ = 0 when ψ gets undefined: under the null,˙ δ becomes simply˙ 0 The relevant variance is the conditional information evaluated at (0,θ 0 ). Notice that entries corresponding to ψ, such as I ψδ and I ψψ reported above, are always zero as they depend on δ that is zero under H 0 . We write the residual vector under H 0 as e 0 = (I n − A)y, being A the hat matrix of the null model, I n the identity matrix, and y the observed response vector; by settingφ = (φ 1 , . . . ,φ n ) T the vector of the means (4), it is possible to write˙ 0 = σ −2φT (I n − A)y, and Var(˙ 0 ) Thus the score test statistic (3) for the null value δ 0 = 0 takes the form that follows a standard Normal distribution under H 0 : δ = 0.

Justifying the proposed test statistic
In this subsection we provide a theoretical basis of the proposed test (5) by giving a justification for the average value (4). We recall that such meanφ i replaces ϕ(x i , ψ) in the null case, namely when δ = 0 and ψ, and ϕ(x i , ψ) accordingly, are undefined. To illustrate, let X | B be the conditional random quantity traditionally defined only in the restriction that the event B is true. In the setting of probabilistic reasoning under coherence and based on the betting scheme of de Finetti, [18,p.122-123] and later in a more general framework, [19] define X | B in a more extended way, namely via X · I(B) + E(X | B) · I(B c ), where I(·) is the indicator function and B c is the negation of B. Roughly speaking, X | B may be interpreted as the amount received in a bet on X conditional on B, if it is agreed to pay E(X | B) (and getting back E(X | B) in the event B c ). The rationale behind this extended definition, including proofs based on probability theory and probability logic are skipped here, but the key point is that it is now possible to define a sort of conditional random quantity even when B is false: it is simply defined via its conditional expectation when B is true, that is, E(X | B).
To apply the aforementioned reasoning to the context of parameters present only under the alternative, let X = ϕ(·, ψ) and B the event of non-null regression coefficient, that is, B = {δ = 0}. Thus in the event B c = {δ = 0} corresponding to the null hypothesis of interest, the 'traditionally undefined' ϕ(·, ψ) is replaced by E(ϕ(·, ψ) | {δ = 0}) of which Equation (4) is an empirical version.
The proposed test has clearly a Bayesian flavour as the expectation of ϕ(·, ψ) implies a probability distribution for ψ. However we avoid calling the method Bayesian because there is no prior-to-posterior pathway and moreover no prior distribution for the remaining parameter is requested.

Some remarks
In most applications σ is unknown, and therefore it has to be replaced by a consistent estimate. Under H 0 both the variance estimate under H 0 ,σ 0 , and H 1 ,σ , are consistent, guaranteeing the correct test size. However usingσ 0 leads to loss of power and thus we suggest to useσ in Equation (5). Notice the resulting test statistic does not have an exact t distribution, as the numerator˙ 0 andσ are not independent: however we would expect that statistic s 0 , since linear in the y i s, behaves reasonably close to a t even with relatively small samples; simulations discussed in Section 3 bear this out.
Test statistic (5) To test for the breakpoint let ϒ be the matrix with ith row (φ 1i ,φ 2i ) T , that is including the average valuesφ 1i = K −1 K k=1 I(x > ψ k ) and [12] Conniffe [12] also proposes a Score-type statistic for hypothesis testing with a nuisance unidentifiable under the null. Conniffe [12] argues that if the regression coefficient δ equals zero, the nuisance parameter ψ might assume any arbitrary value in the admissible range [L, U]; thus he suggests to replace ψ with its unconstrained estimateψ, leading to the statistic˙

Comparison with the pseudo-score of Conniffe
, a hybrid form taking null residuals with weights estimated under the alternative. As discussed by the author himself, using the point estimate can guarantee higher power, however the most serious drawback concerns its sampling distribution. In fact under the null hypothesis when ψ actually does not exist,ψ has an unknown distribution, and thus also the distribution of˙ (c) 0 is unknown. This odd behaviour does not vanish asymptotically and prevents hypothesis testing H 0 : δ = 0 to be carried out in practice: Conniffe [12] himself writes ' . . . and so simulation is probably necessary to ascertain null distributions'. In fact, for the problem of testing for a breakpoint in segmented linear regression, ψ turns out to have a bimodal distribution under the null; this causes˙  The bimodal and unknown null density makes the Conniffe approach unpractical for testing for a breakpoint in segmented regression. Notice similar behaviours are observed with discontinuous changepoints ϕ(x i , ψ) = I(x i > ψ) and linear-quadratic ϕ(

Simulation study
We assess the finite sample properties of the proposed approach via simulations. We limit our attention to segmented linear regression with a gaussian response where interest is in testing for the existence of a breakpoint.  N (0, σ ). δ = 0 is assumed for assessing the size of tests, and δ = −0.15 for the power study. Different scenarios are considered. We use three sample sizes n ∈ {20, 50, 100}, two values of noise σ ∈ {0.025, 0.050}, ψ at the second and third quartile, q 0.50 and q 0.75 , of the two covariate distributions: fixed at equally spaced values, that is, x i = i/n, and random x i ∼ Be (1,2) where Be(1, 2) means a Beta distribution with expected value 1 3 . p-values from the Davies test come from formula (12) of [8] based on 10 equispaced values in the observed covariate range, while 1000 permutations of the null residuals have been employed for the permutation test. For the proposed score test, results come from Equation (5) with 10 equispaced values {ψ k } to compute Equation (4), σ replaced by its unrestricted estimate, and t n−4 as reference null distribution. We anticipate different number and location of the evaluation points {ψ 1 , . . . , ψ K } lead to very negligible differences in the results.
For the three tests, Table 1 shows the empirical sizes and Table 2 reports the empirical powers.
Under H 0 , we may summarize our results as follows. The Davies test underestimates the probability of type I error, and to minor extent, also the permutation test does not control the type I error probability for small samples, by returning empirical rates larger than the nominal ones. The proposed score test appears to guarantee the correct size even for n = 20. The permutation test fails at small samples as the residuals are only asymptotically exchangeable, [16] while the Davies test is slightly conservative as the computed p-values are actually upper bounds. Results for n = 100 are expected and are not shown for brevity: the Davies is still conservative, while the permutation and the proposed Score test exhibit the correct sizes.
Under H 1 , as expected, the three tests have greater power when the error variance is low, the sample size increases, and the breakpoint is in the middle of the covariate range. Covariate patterns also make a difference: on the one hand the three tests have higher power in the fixed-points design x i = i/n when ψ = q 0.50 , on the other hand they share higher power if x i ∼ Be when ψ = q 0.75 , especially at large samples. Such difference reflects the behaviour of the sampling distribution ofδ, and to minor   extent even that ofψ, which if ψ = q 0.75 is more efficient when x i ∼ Be rather than when x i = i/n. Overall, the proposed score test performs somewhat better than its competitors with quite modest differences when σ = 0.025, and relatively marked when σ = 0.05. When comparing powers at n = 20, notice the permutation test is somewhat unfair as it overestimates the type I error probability, see Table 1. As discussed results are substantially unchanged when modifying number K and location of the evaluation points {ψ k } k=1,...,K .

Gene expression profiles in relation to DNA copy number aberrations
Nemes et al. [2] analyse 1161 chromosome fragments in primary tumours from 97 breast cancer patients to study gene expression levels measured via messenger RNA (mRNA) in relation to DNA copy number aberrations (CNA). The authors use segmented regression to assess if the variation of gene expression pattern changes over the range of CNA; this implies to test for a slope change, that is, We apply the proposed Score-based procedure along with the permutation and Davies test to assess significant piecewise linear CNA-mRNA relationships with respect to simple linear fits. We use the same settings employed in simulation studies: 1000 replicates for the permutation test and 10 equally spaced values in the observed covariate range to implement the Davies and the Score test. Out of 1161 relationships, we came to 'significant' p-values (i.e. less than 0.05) in 93, 97, and 109 chromosome fragments using respectively the permutation, the Davies and the proposed Score test. All the three tests returned significant p-values 53 times, but otherwise discordant results emerged: in 27 fragments, a significant p-value was obtained by the proposed Score test only, in 9 cases only the Davies test rejected the null hypothesis, while 11 times the permutation test was the only one to produce p-values < 0.05. Figure 2 reports three CNA-mRNA relationships with discordant findings, namely only one of the three test statistic providing a significant p-value. The plots also portray the fitted segmented lines obtained via the iterative algorithm discussed in [20].
In such contexts with differing results there is no way to know which test is going wrong, but it is of some interest to emphasize that in thenine chromosome fragments whereby only Davies provides significant p-values, the estimated breakpoint always lies on the boundary of CNA range, leaving in some intervals only very few observations (always less than three, see for instance panel (a)). A very extreme breakpoint location, and consequent covariate intervals with just a few points, make the piecewise linear relationship somewhat questionable. However, regardless of issues relevant to specific results, overall the proposed Score test is able to detect a piecewise linear relationship more times than its competitors. Identifying correctly pattern of relationships between DNA and gene expression can generate further hypotheses and important insights into the biology of tumours: for instance identifying genes that promote tumour development and suggesting genes that may be appropriate as potential biomarkers.

Conclusions
We have taken a different view for the problem of testing with a nuisance present only under the alternative in regression models. The proposed test exploits the idea of Score statistic along with a relatively new view of conditional random quantity in the setting of probabilistic coherence: as a result, the proposed statistic is free of any nuisance parameter and permits commonplace computation under the null hypothesis when the nuisance is undefined. Li [21] discusses sensitivity of the test statistics in presence of nuisance, namely how size and power and are affected by estimation of the nuisance parameter. In this respect the proposed test could be considered 'first-rate' as it does not depend at all on the estimation of the nuisance. The associate editor conjectured similarities between the proposed test and the theoretical work of Andrews and Ploberger. [22] This appears indeed to be the case. In the same context of nuisance vanishing under the null hypothesis, Andrews and Ploberger derive optimal tests using a weighted average power criterion function. The final tests depend on the chosen weight function over values of the nuisance parameter and have a Bayesian justification, as such 'weight function' can be viewed as prior of the nuisance parameter. The derived statistics are of an 'average exponential' form, and in addition to the weight function (or prior) for the nuisance, these also depend on a constant regulating power against alternatives.[22, Equation (1.1)] However, while such theoretical work has been recognized by many authors in Econometrics and Economics fields, it seems that no 'practical' proposal or simplification has been discussed. For instance, Hansen [9] writes 'Andrews and Plomberger [22] explore optimal testing but do not discuss methods to obtain critical values in practice'. Some simulations carried out under different scenarios for the problem of testing for a breakpoint have emphasized it attains comparatively higher power when the signal-to-noise ratio is low, and therefore it is more likely to detect statistical significance in practice than its competitors. Application to generalized linear models with non-Gaussian responses is also elementary, as relevant score expressions are easy to manage; Supplementary material includes some simulations for some popular GLMs.
Although we have focussed on segmented linear regression, extension to testing for a changepoint in discontinuous [23] or continuous linear-quadratic [24] relationships appears clear and simple; extension to other examples already mentioned in the Introduction appears straightforward too, at least when the resulting average nonlinear functionφ is monotonic.
Finally, the proposed score approach is favoured to be employed even under some kinds of model misspecification using the so-called robust information; this represents a noteworthy point to be investigated.