Detection of non-Gaussianity

We develop two tests sensitive to various departures from composite goodness-of-fit hypothesis of normality. The tests are based on the sums of squares of some components naturally arising in decomposition of the Shapiro–Wilk-type statistic. Each component itself has diagnostic properties. The numbers of squared components in sums are determined via some novel selection rules based on the data. The new solutions prove to be effective tools in detecting a broad spectrum of sources of non-Gaussianity. We also discuss two variants of the new tests adjusted to verification of simple goodness-of-fit hypothesis of normality. These variants also compare well to popular competitors.


Introduction
Detection of non-Gaussianity, though being a classical problem, is still a subject of intensive research. This is motivated by wide applications and the existing needs to detect non-standard and complex deviations from Gaussianity. For discussion and some evidence, see, for example, Graham et al., [1] Jin et al., [2] Güner et al., [3] Romão et al., [4] Tarongi and Camps [5] and references therein.
One of the best existing procedures for verifying Gaussianity is the test of Shapiro and Wilk. [6] Since its application to large data sets was not easy while the asymptotic theory related to this statistic was very difficult, many simplified variants of this test have been elaborated. For an extensive discussion see Brown and Hettmansperger [7] and LaRiccia. [8] We would like to supplement the list of solutions presented in these references by two more recent and important ones by Chen and Shapiro [9] and del Barrio et al. [10] In extensive study of Romão et al., [4] the three just mentioned statistics have been evaluated as the best omnibus tests of normality.
Several variants and modifications of the Shapiro-Wilk test can be, in principle, classified into three groups corresponding to: optimization of plotting positions, [7,9,11] measuring a discrepancy between empirical and estimated normal quantile function [10,[12][13][14] and procedures attempting to summarize more carefully the appearance of nonlinearity in normal probability plots. [8,[15][16][17] Obviously, these classes are not disjoint and other grouping is also possible. The *Corresponding author. Email: wylupek@impan.pan.wroc.pl goal of the present paper is to modify, combine, and refine two selected solutions from the first and third group, respectively. Some details are as follows. We consider the classical set-up. The observed variables X 1 , . . . , X n are independent and identically distributed with finite second moment, the common distribution function F, and the quantile function Q = F −1 . In terms of the quantile function, the null hypothesis that X i 's have Gaussian distribution can be written as where −1 is the quantile function of the standard normal distribution function while μ ∈ (−∞, +∞) and σ ∈ (0, +∞) are unknown.
Let X 1:n ≤ · · · ≤ X n:n denote order statistics of the sample X 1 , . . . , X n and let Q n be the sample quantile function, i.e. Q n (t) = X i:n for i − 1 < tn ≤ i, i = 1, . . . , n. Additionally, set Q n (0) = lim t 0 Q n (t) = X 1:n . The non-normality of F can be measured through the departure of Q n from the linearity specified by H. As a measure of the discrepancy one can consider the Wasserstein distance between Q n and the best fitted function μ + σ −1 , i.e. the quantity whereμ n andσ n solve the problem For relation of the minimization procedure to the problem of selection of the plotting positions in construction of normality plot see Brown and Hettmansperger. [7] On the other hand, by LaRiccia, [8] where {h j } j≥0 is transformed orthonormal Hermite polynomials system with h j (t) = H j ( −1 (t))/ √ j! and the equality is understood in the L 2 -sense. Therefore, the quadratic measure of fit of the residuals can be decomposed into the sum of squares of components which can be interpreted as empirical Fourier coefficients of unknown Q in the system {h j } j≥0 . Our approach to construction of the test of H is via analysis of components of W 2 n . In this respect, we follow the ideas contained in [7] as well as LaRiccia [8] where tests based on two first components of W 2 n were considered. Their approach belongs to a stream of popular solutions based on empirical skewness and kurtosis such as D'Agostino-Pearson, Bowman-Shenton or Jarque-Bera tests, for example. Such solutions are effective in some situations but fail to detect many non-normal distributions. For some evidence, see Romão et al. [4] Therefore, in our approach the number of components of W 2 n , that are taken into account, is not restricted to one or two first ones, but is specified by some data-driven Akaike-type selection rule. We define and investigate omnibus test of this type. In this way, we supplement and refine the existing ideas by providing a final touch which makes the resulting solution to be much more flexible and sensitive to a larger spectrum of deviations from H. Moreover, motivated by the scope of the papers by Jin et al. [2] and Güner et al., [3] among others, we also propose a variant of the solution which is adjusted to detection of symmetrical alternatives. Extensive simulations show that empirical behaviour of the new solutions is appealing.
The rest of this article is organized as follows. Section 2 introduces notations, some heuristics and basic elements of our constructions. In Section 3, we state basic asymptotic properties of the new tests of H saying that the test statistics have non-degenerate null distributions and the tests are consistent for a large set of alternatives. We also present useful result on the behaviour of the selection rule in large samples. Moreover, some precautions on accuracy of finite sample results are provided. Section 4 introduces the first-and second-moment corrections which improve finite sample behaviour of the components of W 2 n under H and, in consequence, which make them useful auxiliary diagnostic tools. Section 4 also contains a selection of simulated powers under small and moderate sample sizes. We included here six cases of symmetrical alternatives only, while other outcomes we postponed to the supplementary material. The results are encouraging. Therefore, we also considered less traditional setting and some specialized alternatives. Namely, following Jin et al., [2] we studied large sample case and symmetrical alternatives being sparse mixtures and linear combinations of normal and power-tail distributed observations. In Section 4, we included two such cases for an illustration. Section 5 contains some real data examples. In Section 6, we additionally discuss variants of our solutions adjusted to testing simple null hypothesis H * : Q = −1 and compare them to the higher criticism statistic popularized by Donoho and Jin. [18] The conclusions are that in both cases (testing H and H * ), the new solutions work well, outperform simple constructions based on the sums of squares of the first few components and compare well to the best existing solutions.

Preliminaries
Let ϕ denote the density of the standard normal distribution. Recall that the jth Hermite polynomial H j is defined by Therefore, the related system of functions on (0,1) given by is orthonormal and complete in L 2 ((0, 1), dt). For illustration, we plot in the supplementary material the functions h 1 , . . . , h 6 . The formula (3) in [7] relates this system to the inverted Edgeworth expansion of F. Basing on pp.1670-1671 of the last mentioned paper and Section 5 of LaRiccia, [8] we list below the useful results on the statisticsμ n ,σ n , and R n (t), defined in Section 1.
while ϕ l = ϕ( −1 (l/n)), l = 1, . . . , n − 1, and for l = 0 and l = n we set ϕ 0 = ϕ n = 0, i.e. we defined the quantities as appropriate limits. For j = 0, 1, 2, . . . setδ Then, by the properties of h j 's,δ 0,n =δ 1,n = 0, while for j ≥ 2,δ j,n = 1 0 Q n (t)h j (t) dt and The above shows that both the estimatesμ n ,σ n and Fourier coefficientsδ j,n of R n , j ≥ 2, are L-statistics and the successive Fourier coefficients of Q n in the system {h j } j≥0 . An important observation is that, for each j ≥ 1, it holds n i=1 w ji = 0, provided that, similarly as above, we set ϕ 0 h l (0) = ϕ n h l (1) = 0, l ≥ 0. This observation implies, in particular, thatδ j,n , j ≥ 2, is invariant on change of shift. The formula forσ n is obviously a special case of Equation (6).

Some heuristics and basic components
If R x 2 dF(x) exists and is finite then the corresponding Q(t) belongs to L 2 ((0, 1), dt). Since the null hypothesis H asserts that Q(t) − μ − σ −1 (t) ≡ 0, having in mind the properties of {h j } j≥0 , for some fixed k, consider an alternative of the form In such circumstances, we can replace H by the relation k+1 j=2 δ j h j (t) ≡ 0 or, equivalently, by an auxiliary hypothesis H(k) : δ 2 = · · · = δ k+1 = 0.
To propose a statistic for testing H(k), introduce the components whereδ j,n is given by (5) and (6). Note that C j is location and scale invariant. Moreover, under H, C j is asymptotically N(0, 1) while C i and C j , i = j, i, j ≥ 2, are asymptotically uncorrelated, see the supplementary material for a formal proof of these facts. By the above, it seems to be natural to reject H(k) when is too large. This heuristics can be further supported by a formal argument; see LaRiccia and Mason [19] for general theory and LaRiccia [8] for its exemplification in the special case of testing normality under k = 2 and local alternatives of the form 3 j=2 [d j / √ n]h j (t), for some d 2 and d 3 . The optimal weight w 0 , introduced in [19], is then ( √ 3h 2 , √ 4h 3 ) and leads to the defined above statistic N k with k = 2.
However, in practice, k is not known while the decision on k is critical to power behaviour of such a test. Therefore, in the next section, we propose and discuss some rule to decide about k.

Selection of k in N k and basic forms of the new test statistics
The statistic N k resembles Neyman's smooth statistic for uniformity, based on k empirical Fourier coefficients of F in some orthonormal system on (0,1). Ledwina [20] proposed to select k in Neyman's statistic via Schwarz selection rule while later on some simplified variant of the rule was introduced and investigated; see Inglot and Ledwina.[21] Simulations show that Schwarz rule and its simplified variant are very useful to select the number of components when only a few first Fourier coefficients are relatively large whereas the number of observations is small or moderate. Otherwise, the Schwarz penalty is too heavy. Since, in the present paper, we are aiming at proposing some solutions which also work in situations when a signal is not very strong and can have complex nature while the number of observations can be considerably large, therefore, we are proposing a completely new selection rule. It is defined via use of some refined variant of Akaike's penalty and the statistic R n presented and investigated in [10].
To define R n set S 2 n = n −1 n i=1 (X i −μ n ) 2 , cf. Equation (3). Then, by the definition of W n , Equation (4) and some simple algebra It should be mentioned in this place that in the above paper W 2 n /S 2 n itself was denoted by R n . Note that R n is location and scale invariant. As mentioned earlier, R n is a very good statistic for detecting non-normality in several situations. However, since the standardized component √ nδ j,n /S n of W 2 n has the asymptotic variance σ 2 /(1 + j), R n has implicitly built-in some weights. The weights are relatively light and therefore, under H, R n tends slowly to +∞.
The selection rules related to R n and the corresponding test statistics that we propose below, and adjust to practical use in Section 4, are designed to satisfy the following aims: (i) reject null hypothesis in most cases when R n does and retain good overall power of R n ; (ii) take more careful control of the componentsδ j,n 's when R n accepts H and possibly improve R n in some situations; (iii) in the case when the alternatives are symmetrical only exploit parsimonious set of components of W 2 n .
To describe our solutions step by step, let us introduce a sequence {d(n)} of nondecreasing natural numbers, interpreted as the maximal number of components in N k which we shall consider given size of sample n. In other words, we can imagine that we model the function Q − μ − σ −1 considering nested family of models of the successive dimensions 1, 2, . . . , d(n). Next, we define an auxiliary selection rule The choice c = 2 leads to the Akaike's penalty. Such a penalty is relatively light and a use of N A 2 would result in relatively large critical values. This is not very profitable to power of the corresponding test. Therefore, we propose a more complex rule. Recall that small values of R n indicate H and denote by r α the critical value of such a test on the significance level α. The new selection rule A depends, in particular, on R n and the level α, and is defined as follows: where c(n, α) is the smallest value of c such that, under H, it holds that P(A c = 1) ≥ 1 − α. In this place note that, in practice, c(n, α) is determined by simulations, basing on artificial samples of sizes n from the N(0, 1) distribution, as N k and R n are location and scale invariant. So, in particular, the choice of c(n, α) is independent from the data at hand. Given n, r α is also found by Monte Carlo experiment. Our simulations show that the penalty c(n, α) and analogous quantities defined below are much greater than 2. Therefore, the rule A is restrictive when the 'switch' {R n ≤ r α } indicates H and is liberal otherwise. This results in relatively small critical values of the corresponding tests and simultaneously allows for flexibility when the 'switch' indicates an alternative.
As mentioned in Section 1, in some applications, it is natural to restrict attention to symmetrical distributions, only. For such distributions Q is asymmetrical and hence . . Therefore, then it is enough to concentrate on odd functions and the related list of models and the related quantities A o c , c o (n, α), and A o . In particular, Our first solution rejects H for large values of whereas the variant specialized to symmetrical cases rejects H for large values of

Theoretical properties of the new tests of H and preliminary comments
In this section, we assume that both d(n) and Proposition 1 says that asymptotic null distributions of the new test statistics are nondegenerate. Proposition 2 shows flexibility of the new selection rules under the alternatives and demonstrates that the new solutions are able to detect a large spectrum of alternatives, when sample size grows to infinity. More precisely, recall that by the argument of Section 2.2, a distribution function F with finite second moment is an alternative to the null N(μ, σ )

and odd), and and o defined above, it holds that
Asymptotic behaviour of N A and N o A o under H is a consequence of asymptotic normality of the components C j , j = 1, . . . , D, their asymptotic independence and Theorem 3 in [10]. Note that, however, the convergence in these asymptotic results is extremely slow. For some evidence, see Brown and Hettmansperger, [7] Krauczi, [22] and Tables S1 and S3 of the supplementary material. Therefore, in the next section, we introduce some finite sample corrections. Similar corrections are introduced in Section 6 in the case of testing simple null hypothesis H * . The corrections improve substantially finite sample behaviour of the components C j 's; cf. Tables S2 and S5 of the supplementary material. Moreover, since the selection rules A and A o have built into the penalties which are adjusted to given α and n, relying on the slow asymptotic is further eliminated in this way, as well. In Section 4.1, below, and in Section 5 of the supplementary material we illustrate how qualitative asymptotic results on A and A o , presented in Proposition 2, manifest quantitatively in some finite sample situations. Table 2 of Brown and Hettmansperger [7] provides some evidence how slowly, under H, estimated mean and standard deviation of C 3 , approach the limiting values, when sample size ranges from 5 to 100. The drawback does not vanish even for very large sample sizes; see Table S1 in the supplementary material for this article. Therefore, some finite sample corrections are welcome. For this purpose, given sample size n, we generate nr independent realizations of the N(0, 1) random variables and calculate the resulting values C i,j,0 , i = 1, . . . , nr, say, of C j , j ≥ 2. The notation nr shall be used throughout for the number of Monte Carlo runs. However, note that the number nr is not the same in all places.

Finite sample corrections, test statistics and simulation results
We set e j,0 = (nr) − Since, under H, the distribution of X 1:n , . . . , X n:n is symmetrical, therefore, EC 2l = 0 and one could skip the calculation of the related e 2l,0 's. However, to do not complicate the notation and presentation, we estimate all the quantities.  N(0, 1). Therefore, significance of single component can be easily noticed and verified.
With usage of the corrected variants C j 's define respectively. Finally, define the selection rule A similarly as in (8). Its counterpart A o is constructed analogously.
The resulting data-driven statistics N A and N o A o shall be investigated in our simulation study presented below and in the supplementary material.
In this section, we consider moderately large sample sizes ranging from n = 50 to n = 150 and large n equal to 244 2 . We compare four tests based on R n , N A , N o A o , and QH. The statistic QH is the already mentioned solution by Chen and Shapiro [9] and is given by A o are much greater than those of N A . We discuss this question in the supplementary material. Note also that for large sample sizes and in the case of testing simple null hypothesis H * , discussed in Section 6, the problem disappears.

Empirical powers under small and moderate sample sizes
We considered several alternatives both symmetrical and asymmetrical. Below, in Figures 1 and 2 we present six cases of symmetrical alternatives. The alternatives are described as follows. Let  The first three cases, displayed in Figure 1, concern distributions considered by Krauczi. [22] She, in turn, followed Shapiro et al. [25] and Gan and Koehler. [26] In course of our simulations, for each alternative we estimated medians of the components C j 's. Note that, given n and j, each median is simply calculated from values of C j obtained in the successive 10,000 Monte Carlo runs. We present medians, not the averages, as we investigated and included into our study some long-tailed alternatives. Next, immediate observation is that for many different alternatives the components C j 's behave similarly. Therefore, in particular, the components are a basis to select really different cases. In Figure 2, we present three relatively complex (in terms of the components) and interesting symmetrical cases. Results for three other symmetrical cases from Krauczi [22] as well as for six asymmetrical alternatives are shown in Figures S2-S4 in the supplementary material. All these figures have the same structure. We summarize the data by presenting the estimated medians of C j 's and below we display empirical powers for n = 50, . . . , 150. In this study, α = 0.05.
The results show that, in general, empirical behaviour of QH, R n , and N A is similar. In the cases A 3 , A 5 , and A 6 , N A is slightly better than R n . N o A o provides considerable gain in power in case of symmetrical alternatives that are picked (A 3 ) or multi-modal (A 5 , A 6 ). The alternative A 5 represents here normal heterogeneous mixtures while A 6 describes some sinusoidal deviation from H. The importance of detecting mixtures is extensively discussed in [27], for example, while several sinusoidal signal models are considered in the literature on microwave radiometry; see Tarongi and Camps [5] for an overview. Moreover, in some applications the symmetry is naturally built-into the underlying model. For some illustration, see De Roo et al. [28] and the models discussed in [3]. In such cases,   Table 3 collects some evidence in the case of alternatives A 1 , A 5 , and A 6 . Observe that K 2 is close to K 1 , as the penalty in A o is small when {R n > r α }. The quantity K 3 exhibits the gain in empirical power due to more careful inspection of components in the cases when the test based on the standardized Wasserstein distance R n accepts H. Table 3 shows that if only few first odd components are relatively large, then the selection rule only provides little extra impact to the power of N o A o , as in such cases almost all information was caught by R n . However, if components of higher orders play the main role the rule A o recovers to some extent the information which was partially lost by R n , due to the inexplicit weighting builtinto it. Obviously, the magnitude of components essentially influences the counts K 3 . Therefore, if the signal is weaker then the ability of A o to repair some weakness of R n shall be smaller. The above illustrates the way in which we fulfilled the postulates stated in Section 2.3.
We also studied empirical powers of tests based on two first components. Since the results were as it could be expected, we are not presenting them here.
We close with discussing useful relation of tests based on R n and QH. Namely, note that the tests are very similar in structure. Indeed, by Equations (4) and (7) it holds that 0 ≤σ n ≤ S n and R n rejects the null hypothesis when is too small. The test based on QH rejects H for small values of . . , n − 1, a n = ψ n−1 . Therefore, possible differences in the behaviour of R n and QH are consequences of the underlying coefficients of the successive order statistics. Question of that kind, in the case of comparison of QH with some celebrated competitors, was already discussed in [9] and further studied in [29,30]. Following these important contributions, we consider the normalized and call them the weights. Bai and Chen,[29, p.487] concluded that: comparing the weights . . . one finds that tests with large weights for extreme order statistics are more powerful to detect short tailed alternatives and tests with small weights for the extreme order statistics are more powerful to detect heavy-tailed alternatives. Therefore, to have some insight into the present situation, we calculated the ratios r i (a, b) = a i /b i of the weights. For some comparison to earlier results discussed in [30]; in Table 4, we present the ratios for n = 10, 20, 30 and i = n/2 + 1, . . . , n. Moreover, for n = 50, 150, and n = 244 2 , i.e. for the selection of sample sizes considered above and in Section 4.2, we display 14 ratios for i = n/2 + 1, . . . , n/2 + 7 and for i = n − 6, . . . , n, only. In all the considered cases, we observed that, given n, r i (a, b) are slowly growing with i, for i large enough, r n/ 2 +1 (a, b) < 1 and r n (a, b) On the other hand, the distributions in A 1 , A 5 , and A 6 have kurtosis less than 3. For these alternatives, QH outperforms R n . In the case of long-tailed alternatives, A 2 and A 4 the situation is reversed. For A 3 the kurtosis is close to three while empirical powers of QH and R n are almost identical. Therefore, the above along with the evidence in Table 4 further support usefulness of the quoted guideline of Bai and Chen. [29]

Empirical powers when detecting symmetrical mixtures and sums, large n
In this section, we impose the set-up similar to that described in [2]. In particular, now n = 244 2 and alternatives are sparse heterogeneous mixture M (·, ·), defined above, and linear combination of N(0, 1) variable and Pareto one; denoted here by P(·, ·). More precisely, P(λ, a), λ > 0, a > 0, stands for the distribution of N(0, 1), W has the symmetrical Pareto distribution with the tail index a, while Z and W are independent. The density of W has the form p(w) = (a/2)|w| −1−a × 1{|w| > 1}, where 1(·) is the indicator of the set ·; cf. Grabchak and Samorodnitsky. [31]  In this section, we consider two significance levels: .05 and .01. As before, d(n) = 20 and d o (n) = 10. In contrast to Jin et al., [2] here the null hypothesis is a composite one. Figure 3 shows that the test based on the statistic N o A o , specialized to detect symmetrical alternatives to H, has stable empirical power that compete to that of omnibus tests considered in our study. However, in contrast to some cases shown in Section 4.1, the gain of power of this new test is rather moderate. The reason is that, in the cases studied in this section, one of the two 'early' Fourier coefficients on the underlying list, i.e.δ 3,n orδ 5,n , is dominating. Moreover, the magnitude ofδ 5,n is rather moderate. In such circumstances, substantial improvement over the test based on R n is hard to get. We shall see that situation changes in the case of these and similar alternatives when testing simple null hypothesis. Simple null hypothesis was considered in [2] and is discussed in Section 6.

Real data examples
First, we considered n = 20 measurements of percentage of silt from the successive plots in the depth 3, presented in [32, p.121]. It is of interest to test for normality.
We summarize the data in Figure 4 via normal Q-Q plot and empirical components C 2 , . . . , C 21 . The Q-Q plot provides a warning that the data distribution and theoretical normal one do not match very well and suggests that empirical distribution has shorter tails. It also exhibits some symmetry in data distribution. The components supplement these findings also suggesting symmetry of the empirical distribution. Moreover, the magnitude of some of components provides quite strong evidence of non-normality.  Figure 4. Normal Q-Q plot and components C j , j = 2, . . . , 21 for the silt data.  Next, we considered the tephra data described and analysed by Bowman and Azzalini, [33, pp.27, 38-40]. As there, we applied logistic transformation to the original data. The n = 59 outcomes are summarized in Figure 5. Normal Q-Q plot indicates some skewness. The observed magnitude of some of components C j , j-even, supports existence of such feature of the data. The tests R n , QH, SW, AD, and CvM when applied to these data lead to p-values: .12, .14, .13, .03, and .02, respectively. So, CvM and AD indicate significant non-normality on relatively small levels. In contrast, the first three tests lead to non-significant p-values. Similar outcomes result from usage of N A and N o A o , cf. Table 6, though N A appears to be slightly more sensitive.

Some remarks on testing simple null hypothesis
Since recent years testing simple null hypothesis received considerable attention as well, see Donoho and Jin [18] and the consecutive papers; in this section, we introduce and discuss some counterparts of N A , N o A o , and R n adjusted to H * , and compare them to the higher criticism statistic introduced in the aforementioned paper.
Natural counterparts of R n , W n , and R n have the form and R * n = n{W * n } 2 .
Letδ * 0,n , . . . ,δ * k,n minimize over δ 0 , . . . , δ k , δ l ∈ R, l = 0, . . . , k, the expression Hence, due to the properties of h j 's and Equations (3), (4), (6), δ * 0,n =μ n ,δ * 1,n =σ n − 1,δ * j,n =δ j,n , j ≥ 2, and {W * n } 2 = {δ * 0,n } 2 + {δ * 1,n } 2 + W 2 n . (11) Set C * j = n(j + 1)δ * j,n and denote finite sample corrections of the mean and the standard deviation of C * j , under H * , by e * j,0 and v * j,0 , respectively. Now, define as well as Analogously as before also define A * and A o * and the related data-driven statistics N * A * and N o * A o * . For completeness, we give some formulas, the related corrections and some critical values in Section 4 of the supplementary material. Following Jin et al., [2] we consider the higher criticism statistic of the form where F n stands for the empirical distribution function, and the two-sided test based on the empirical kurtosis given by The formula (12) can be recognized as the well-known Jaeschke [34] statistic. Tables 7 and 8 contain some simulated powers under n = 75 (symmetrical and asymmetrical cases) and n = 244 2 (symmetrical alternatives). The alternatives mostly comes from the set considered and discussed in Section 4. Three additional ones are defined as follows. t (5) is the Student distribution with 5 degrees of freedom. LC(ε, m) has the density εϕ(x − m) + (1 − ε)ϕ(x), ε ∈ (0, 1), x ∈ R, where ϕ is the standard normal density . TruncN(a, b), a ∈ R, b ∈ R, a < b, stands for the standard normal distribution truncated at a and b.
The results show that the new data-driven tests perform well in the study. Moreover, in contrast to the case of H, when testing H * , under n = 244 2 , N * A * outperforms R * n considerably. This is mainly caused by two facts. Due to (11), critical values of the test based on R * n are much greater than those ones based on R n while, in case of many alternatives, the first two empirical Fourier coefficientsδ * 0,n andδ * 1,n are small and their impact into the empirical power does not compensate for the increase in critical values. For the data-driven solutions N * A * and N o * A o * critical values under n = 75 are much smaller while under n = 244 2 are similar to or even slightly less than respective values under H, while the selection rules provide considerable amount of additional information. We present some selected simulation results and some further discussions on this point in Section 5 of the supplementary material.
The test based on K * is one of the leading solutions in the cases when the component C * 3 is the largest one. However, in the situations when C * 3 is close to 0 the kurtosis K * completely breaks down. For values of C * j , related to alternatives in Table 8, see Figure S5 in the supplementary material.
Empirical power of HC is unstable and, therefore, the test is not a highly reliable omnibus goodness-of-fit procedure. Jin et al. [2, p.2482] mention that 'when the actual evidence lies in the middle of the distribution', then HC will be very weak. This corresponds with earlier theoretical developments of Eicker, [35, p.117] who infers on sensitivity of (12) only in moderate tails and Révész, [36] who also suggests sensitivity of (12) for some changes in the tails only. The above findings are good motivation to propose new solutions behaving stably under a large spectrum of alternatives. It seems that the constructions proposed in this paper meet such a goal to a satisfactory extent. Similar solutions can be constructed for testing of fit to N(0, σ ) and N(μ, 1), μ, σ -unknown, distributions.

Conclusion
We have proposed some data-driven tests to detect departures from Gaussianity. The main focus is on composite null hypothesis. There are many kinds of non-Gaussianity and many kinds of tests to detect them. Several tests focus on specific deviations and are not sensitive to others. Our goal was to propose solutions which are sensitive to commonly studied deviations from Gaussianity and which, simultaneously, are able to detect some complex alternatives. Our constructions have simple interpretation in terms of normalized Fourier coefficients of the sample quantile function. These normalized coefficients are called components and obey approximately the standard normal distribution under the null hypothesis. The number of components, which is taken into consideration, depends on the data at hand and is defined via Akaike-type selection rule. The resulting solutions prove to behave stably and efficiently under a large spectrum of alternatives and compare well to the best existing competitors. Similar approach can be in principle applied to some other location/scale families. This, however, requires some further careful investigation. Note that De Wet [37] provides insightful discussion on tests for location and scale families based on a weighted Wasserstein distance. The paper by Csörgő [38] further extends these considerations.

Supplemental data
Supplemental data for this article can be accessed http://dx.doi.org/10.1080/00949655.2014.983110.