Bias reduction by transformed flat-top Fourier series estimator of density on compact support

The problem of nonparametric estimation of a univariate density with rth continuous derivative on compact support is addressed ( ). If the density function has compact support and is non-zero at either boundary, regular kernel estimator will be completely biased at such boundary. Although several correction methods were proposed to improve the bias at the boundary to in the last decades, this paper initiates a way to further improve bias to higher order ( ) for interior area of density function support, while remaining the order of bias at boundary. We will first review flat-top kernel estimator and flat-top series estimator, then propose the Transformed Flat-top Series estimator. The theoretical analysis is supplemented with simulation results as well as real data applications.


Introduction
Suppose the observed sample X 1 , . . . , X n is independent, identically distributed (i.i.d.) having univariate smooth density distribution f X (x) with compact support; without loss of generality, let f X (x) has support [0, 1] and rth continuous derivative (r ≥ 2). The independence assumption is just for ease of presentation. All results in the paper at hand hold true if the sample X 1 , . . . , X n is only assumed to be strictly stationary obeying some weak dependence assumptions, see Paparoditis and Politis (2016a) and the references therein.
The standard kernel density estimator iŝ where K(·) is a kernel function that usually is symmetric and satisfies K(x) dx = 1; here h > 0 denotes the bandwidth which is assumed to tend to zero as n → ∞. It is well known that such density kernel estimators suffer from boundary effects. Numerous methods have been developed in the literature to reduce the boundary bias problem. Schuster (1985), Silverman (1986), Cline and Hart (1991) considered the reflection method, which will reduce the boundary bias to O(h 2 ) when the derivative of density at boundary is zero, otherwise the boundary effect is still slight improved with bias O(h). The boundary kernel method that uses a nonsymmetric kernel function at boundary points was developed by Gasser and Muller (1979), Gasser, Muller, and Mammitzsch (1985), Jones (1993), Muller (1991), Zhang and Karunamuni (2000). The local linear method is more general and fast implemented without many assumptions on; it was developed by Cheng, Fan, and Marron (1997), Cheng (1997), Zhang and Karunamuni (1998). The pseudo-data method, proposed by Cowling and Hall (1996), generates pseudo-data by linear interpolation of order statistics and estimates the density function by using original data together with pseudo-data.
The transformation method, which is employed in the paper at hand, was first proposed by Wand, Marron, and Ruppert (1991), and then further developed by Wen and Wu (2015). Hall and Park (2002) presented the 'empirical translation correction' method. The beta kernel estimator that uses the beta density as kernel function was proposed by Chen (1999). Jones and Henderson (2007) discussed a Gaussian copula estimator. Zhang, Karunamuni, and Jones (1999) combined the pseudo-data, transformation and reflection methods. Marron and Ruppert (1994) combined transformation and reflection methods.
In this paper, we propose to combine the above-mentioned transformation method and the flat-top kernel series method of Politis (2001), which eventually improves the bias at interior area of density function support to h r , while remaining bias at boundary h 2 . To give some background on the flat-top kernels, Politis and Romano (1999) and Politis and Romano (1993) proposed flat-top kernel probability density estimators in the univariate and multivariate cases respectively. Later, Politis (2001) proposed flat-top series estimator of a density with compact support while Politis (2003) introduced an adaptive bandwidth choice method for spectral density and probability density functions estimated with flat-top kernels. More details of flat-top kernel estimator and flat-top series estimator are provided in Section 2.
The remainder of the paper is organised as follows. Section 2 gives some background on the transformation method, the flat-top kernel and flat-top series estimators, along with some preliminary results. Section 3 presents the Transformed Rectangular Flat-top Series estimator. Section 4 presents the Transformed Infinitely Differentiable Flat-top Series estimator and shows the higher order bias in the interior region. The implementation and selection of parameters are discussed in Sections 5 and 6. Section 7 conducts simulation study and Section 8 gives the application to three financial returns time series. Concluding comments are given in Section 9; all technical proofs are in the Appendix.

Review of the transformation method
Before starting to introduce the spirit of the transformation method, we first briefly review why regular kernel density estimator misbehaves with a density function on compact support. Equation (1) and a Taylor expansion off X (x) with respect to x near the left boundary give for the proof, see Appendix A of Marron and Ruppert (1994). In the above, x = Ch for some C ∈ [0, 1], μ k (C) = C −1 u k K(u)du, and f (j) X is the jth derivative of f X . Since μ 0 (0) = 1 2 , μ 0 (1) = 1 and μ 0 (C) < 1 for C ∈ [0, 1), the kernel estimator at zero has bias that does not vanish asymptotically.
To correct this problem, Schuster (1985) proposed the reflection method which is implemented asf Then under the same assumptions X (0) = 0, the bias above has order of h which tends to zero as n → ∞. Furthermore, if f (1) X (0) = 0, the bias will have order of h 2 . Marron and Ruppert (1994) proposed to construct a new random variable Y = g(X) with probability density f Y whose first derivative satisfying f Y (0) = 0. Then, they estimated f Y near the left boundary via a kernel with the reflection method obtaining a boundary bias of order O(h 2 ) which can then deliver this property back to estimating f X . To elaborate, denote the transform function by g(·), a fixed one-to-one and monotonically increasing function; suppose random variable X follows the density f X (x) with support [0, 1], let Y = g(X), and denote f Y (y) the density function of Y. Then, , then the transformed estimator employed by Marron and Ruppert (1994) waŝ Assume g(x) is smooth, and that sup x∈ [0,1] |g (x)| ≤ M for some positive value M, then, Similarly, we also have The methodology to be proposed in our paper also uses the transformation idea Y = g(X) with g(·) satisfying some specific conditions but uses the flat-top series estimator of Politis (2001) instead of the reflection method, more details will be given in Section 3.

Review of flat-top kernel density estimator
This section reverts to considering the problem where the random variable X has density f X (x) on the whole of R. For this problem, higher order kernel density estimator and infinite order kernel estimator were designed to improve the order of bias. Recall that a kernel function K(·) is of order q if K(·) has finite moments up to qth order, and moments of order up to q−1 equal to zero, i.e. x q K(x) dx = 0 but x k K(x) dx = 0 for ∀k ≤ q − 1. Now suppose a density function f X (x) has r continuous and bounded derivatives, then it is well known that where k = min(q, r) and c f ,K (x) is a bounded function depending on K(·), f X and its derivatives. Standard kernel functions are nonnegative and symmetric around zero, and consequently the bias is of order O(h 2 ) provided r ≥ 2. To get a better bias order of O(h k ) with k ≥ 2, one must use a kernel of higher order q > 2; this idea dates back to Bartlett (1963) and Parzen (1962). More references on high-order kernels include Cacoullos (1966); Gasser et al. (1985); Silverman (1986); Devroye (1987); Nadaraya (1989); Granovsky and Müller (1991); Marron and Wand (1992); Jones and Foster (1993); Marron (1994); Jones (1995). The question arises: what order kernel to use to get full advantage of possible smoothness of f X and obtain maximum bias reduction? One needs to choose order q ≥ r but the degree of smoothness r is unknown. To achieve q ≥ r for any potential value of r, it is natural to choose q = ∞. Indeed, the aforementioned flat-top kernels have infinite order and achieve a bias of order O(h r ) no matter how large is r. Furthermore, Politis (2003) proposed a simple bandwidth selection methods for flat-top kernel estimators that is easier than the bandwidth selection procedures associated with finite-order kernels. In the following definition, we denote the flat-top kernel function as (·) instead of K(·). Definition 2.1: Let c be a positive value; the kernel c is said to be a member of the general family of univariate flat-top kernels of infinite order if where the Fourier transform ω c (s) satisfies the following properties: The above three properties will guarantee the infinite order, a finite variance of the resulting estimator, and that c (x) is real valued. In this paper, c is a positive value ≤ 1, and we will assume ω c (s) has compact support, i.e.
where the function η ω (s) determines the shape of ω c (s) outside the flat regions. Like regular kernel estimator, with a bandwidth h, we define Denote φ X (s) = e isx f X (x) dx the characteristic function, and letφ X (s) = (1/n) n k=1 e isX k be the sample characteristic function; then the flat-top kernel estimator of f X (x) iŝ Theorem 2.2 (Politis and Romano (1999)): Assume there is an r > 0, such that Assume that n → ∞, and let h ∼ An −1/(2r+1) , for some constant A > 0, it follows that where MSE is short for Mean Squared Error.
Note that condition (7) implies that f X (x) has r bounded and continuous derivatives.

Remark 2.3:
Under the assumption that f X is infinitely differentiable function, although it is less practical, one can show Theorem 2.2 still holds, for any non-negative integer r. Thus if f X is infinitely differentiable function, with appropriate selection of bandwidth h, Flat-top Kernel Estimator will achieve very fast convergence rate. Selection of bandwidth h will be discussed in Section 6.2.

Flat-top series estimator of density on compact support
We now return to the problem where f X (x) is a very smooth density function on [0, 1]. Denotef X (x) the periodic extension of f X (x) on the whole real line; apparentlyf X (x) has period 1. Denote the characteristic function φ X (s) = 1 0 e isx f X (x) dx, and the sample characteristic functionφ X (s) = (1/n) n k=1 e isX k . Now expandf X (x) in a Fourier series: Analogously to (6) -but replacing Fourier transform by Fourier series -is the definition of flat-top series estimator of Politis (2001), i.e. let Apparently s∈Z λ c (s) e −i2π sxφ X (s) is also a periodic function with period 1 on the whole real line but we only need the part on the interval [0, 1]. First, recall the following theorem: Notice that condition (9) impliesf X has r bounded and continuous derivatives; in particular, it implies that f X has r bounded and continuous derivatives that satisfy 'wrap-around' continuity, i.e.
Remark 2.5: Similar to Remark 2.3, under the assumption thatf X is infinitely differentiable function, one can show Theorem 2.4 still holds, for any non-negative integer r.
In next theorem, we relax (9) it in the following new theorem: Theorem 2.6: Let f X (x) be a differentiable density function defined on [0,1], and denote f (1) From the proof of Theorem 2.6 given in the appendix, we can also obtain the following Corollary.

Corollary 2.7: Let f X (x) be a twice differentiable density function defined on
Lastly, we present the following new theorem under a weaker condition compared to (9) in Theorem 2.4. Theorem 2.8: Let f X (x) denote the density function on [0, 1] andf X (x) be the periodic extension of f X (x). Iff X (x) has (r − 1)th bounded and continuous derivative (for some r ≥ 1), andf The theorem's proof is provided in the appendix for a special case of ω c (s), i.e.ω c (s) with c = 1 and η ω = 0, and we also call it a rectangular flat-top function; the general case is similar albeit more involved technically.

Transformed rectangular flat-top series estimator
Although Theorems 2.4 and 2.8 guarantee good properties of flat-top series estimator, they also require very strict conditions. In practical use, the periodic extensionf X of density f X rarely has up to rth bounded and continuous derivative. The smoothness of f X (x) on (0, 1) is not issue; the issue is with the boundary values. For example, there is no reason to assume that f X (0) = f X (1), and the same is true for the derivatives of f X . In other words, it is not realistic to assume the periodic extensionf X is smooth or even continuous in which case the flat-top series estimator fails. However, the transformation method can be used to bring us to a realm where the boundary continuity conditions are satisfied, and hence the flat-top series estimator can be successfully used.
Consequently, our transform function g(·) has the particular target to render , then the bias of flat-top series estimator has order at most O(h) universally. Thus we need to modify our 'target' to be to achieve order of bias O(h 2 ).
Remark 3.1: 'Target' (13) is only designed for rectangular flat-top series estimator. In the next section of infinitely differentiable flat-top estimator, we will just need the basic target In what follows, we will pick g(·) from some parametric family of functions. Let ⊂ R k and G = {g θ (·) for θ ∈ } be a parametric family of transformations where each g θ (x) is defined for x ∈ [0, 1]. We will need the following condition throughout.

Condition 3.2:
Each g θ (x) ∈ G is three times continuously differentiable with respect to x and has a strictly positive derivative from [0, 1] to [0, 1]. Derivatives at 0 and 1 are one sided. In addition there exists a constant K > 0, s.t.
where g (j) θ (x) is the jth derivative of g θ are with respect to x, and || · || denotes the Euclidean norm on R k . One example where condition (3.2) is satisfied is when g θ is polynomial with θ being the vector of coefficients.
Equation (2) still applies but since g is no longer fixed, we need better notation. Here and for the remainder of the paper, let f Y (·; g) denote the density function of Y = g(X), letf Y (·; g) denote its flat-top series estimator, let φ Y (·; g) denote its characteristic function, and letφ Y (·; g) denote the estimated characteristic function. We then formally define the Transformed Flat-top Series estimator aŝ for some g ∈ G. Now assume Y θ 0 = g θ 0 (X) satisfies (13); we shall be concerned with the case where θ n is an estimator of θ 0 based on the sample {X i } n i=1 . Let Y θ n = g θ n (X), the actual transformed flat-top series estimator -using a data-based estimator for g -then iŝ Using the same trick as in Marron and Ruppert (1994), we define the expectation of flat-top series estimator of f Y θn given g θ n as which is a sort of modified partial sum of Fourier series of the periodic extensionf Y θn (y;

Remark 3.3:
The above definition side-steps the randomness inherent in g θ n to develop tractable expressions for bias and variance. The reason that we can safely ignore the randomness inherent in g θ n is that g θ n is estimated at a fast rate. In fact, in Section 5 we will use simple boundary correction method by Jones (1993) to estimate g θ n of which mean squared error with order O p ( (nh) −1 + h 2 , smaller (and therefore negligible) compared to n = 1 log(1/h) , which is the rate of convergence given in the following novel result.

Theorem 3.4:
For the problem f X is r times continuously differentiable (r ≥ 2) on [0, 1] (and therefore also bounded) but f X (0) = f X (1), suppose θ 0 ∈ , and Condition 3.2 holds. Define G n = {g θ n : θ n − θ 0 ≤ n and θ n ∈ }. Let h = h(n) → 0 as n → ∞ for some function h(·), with large enough n so that h< 1, and let n = 1 log(1/h) . Then for any Note that if θ n − θ 0 = o p ( n ), then P(g θ n ∈ G n ) → 1, and the above result allows for g θ n being random. As mentioned in Remark 3.3, In other words, With Equation (15)  ) (see proof of Theorem 3.14). The bias of the proposed estimator will be studied in the rest of this section. Similar to variance, we start from the difference between Ef Y θn (y; g θ n ) and f Y θn (y; g θ n ). Ef Y θn (y; g θ n ) has been defined in Equation (18), by Definition 2.1 and the definition of ω c (s) in Equation (5), we can treat Ef Y θn (y; g θ n ) as a partial sum of Fourier series with coefficient λ c (s) at different points s. Remark 3.6: If θ n → θ 0 at rate q(n) where q(n) = n −γ for some γ > 0, then Y θ n does not really satisfy (13). By Condition 3.2, it is obvious that (21) Similarly, if θ n → θ 0 in probability at rate q(n), then (22) Equations (21) and (22) imply that Y θ n fails to satisfy (13). Furthermore, it means the peri- is not continuous at y = . . . , −2, −1, 0, 1, 2, . . ..
As discussed earlier, Ef Y θn (y; g θ n ) is a partial sum of Fourier series off Y θn (y; g θ n ), which is essentially piecewise continuously differentiable. It is well known the partial sum of Fourier series has oscillations near the discontinuity point, and this peculiar manner is called Gibbs Phenomenon. Fortunately the oscillation between the true function and partial sum of Fourier series is bounded by the size of the jump of the discontinuity at discontinuous points (more details can be found in Lemma 3.10). In our case, it means the oscillation between Ef Y θn (y; g θ n ) and f Y θn (y; g θ n ) at boundaries 0 and 1 is also bounded by always has bias at the boundaries of order q(n), which is identical to the convergence rate of θ n → θ 0 . In this section, the Transformation-based Rectangular Flattop Series Estimator can retain the same bias of order h 2 in the interior of support. This bias can be improved to higher order by using infinitely differentiable flat-top series estimator introduced in the next subsection. Now recall Definition (5) of ω c (·); when c = 1, ω c (·) is the rectangular flat-top function that will be denoted simply as Recall equation (18) and λ c (s) = ω c (hs), it follows that We now define i.e. S N hf Y θn (y; g θ n ) denotes the partial sum of Fourier series, is the periodic extension of f Y θn (y; g θ n ). The following lemmas are showing the bias at boundary points.
Lemma 3.7: Define is the nth partial sum of Fourier series: For any x / ∈ Z, i.e. x is away from the discontinuity of φ(x), we have Notice φ(·), φ 0 (·) and φ n (·) are notations only applied to Lemma 3.7.
is an interval away from discontinuity, then we can also have sup x∈ [a,b] |φ Remark 3.9: If f (x) is discontinuous at a point a, then we can define a continuous function and f * (x) is now continuous, very helpful for showing the next lemma. The reason we construct f * (x) by φ(x) instead of by a step function is that φ n (x) has a better form and more convenient for the proof.

Lemma 3.13: Let f be a real-valued function on
Notice all the above lemmas and theorems are discussing the rectangular flat-top series estimator for transformed random variable Y θ n . Recall (16), the following theorem takes us back to random variable X: Theorem 3.14: Assume Condition 3.2 and h = An −1/5 . If θ n = θ 0 + O p (h), then for any 0 < a ≤ b < 1, the Transformed Rectangular Flat-top Series estimator converges to true density function: Although Theorem 3.14 only requires θ n = θ 0 + O p (h), the bias reaches O(h 2 ) only for a closed interval inside [0, 1]. Looking back at Theorem 3.12, the bias order at boundary is still O(h). To get bias order O(h 2 ) at boundary, we need

Transformed infinitely differentiable flat-top estimator
Transformation-based infinitely differentiable flat-top estimator introduced here is our ultimate estimator, as mentioned in Section 1, it is the one improves order of bias at interior area of density function support to higher h r . Recall (5), i.e.λ c (s) = ω c (hs) with c < 1 and some function η ω . Equation (18) shows that the mean of flat-top series estimator is a modified partial sum of Fourier series. As discussed in previous section, convergence of nth partial sum of Fourier series at the discontinuity point suffers from oscillation, also known as Gibbs phenomenon. This issue does not only negatively affect convergence at the discontinuity but also slows global convergence rate.
for some b > 0 and c < |s| < 1; here, c determines the width of region where the flat-top function equals 1, and b allows us to alter the shape of η ω . The selection of these two parameters will be discussed in the next section. Function η ω is able to connect the region where ω(s) is 1 and the region where ω(s) is 0 in a manner such that ω(s) is infinitely differentiable for all s, even including where |s| = c and |s| = 1. Recall again that the expectation of a flat-top series estimator is a modified partial sum of Fourier series, and see the following lemma.
where C is some constant, d(y) = min k=−1,0,1 |y − ξ + k|, and Notice the condition y is away from ξ is actually defined by d(y) = min k=−1,0,1 |y − ξ + k|. If y = ξ , the convergence by (26) fails. If y = ξ , the order of convergence is proportional to 1 d(y) . In addition Lemma 4.1 shows that convergence rate of this modified partial sum of Fourier series is O(n −r ) when y is away from discontinuity ξ .

Remark 4.2:
By imitating the proof of this Theorem in Gottlieb and Shu (1997), one can easily show the same result if f does not have discontinuity.

Remark 4.3:
One interesting property of such modified partial sum of Fourier series, by definition of H(f ), is that its convergence rate O(n −r ) is not influenced by the continuity of f (l) (·) at ξ . It implies that when practitioners are looking for transformation function, transformed random variable does not have to satisfy (13). More specifically, suppose ξ = 0 and discussingf Y θn (·; g θ n ), the convergent rate O(N −r h ) of bias remains, even whenf Y θn (·; g θ n ) andf (1) Y θn (·; g θ n ) are discontinuous at 0 (i.e. even when |f (i) Y θn (1; g θ n ) − f (i) Y θn (0; g θ n )| = 0 where i = 0, 1). Thus the bias of the point away from discontinuity always has order O(N −r h ) = O(h r ). But for the points around discontinuity, it is easy to show that Lemma 3.10 and Theorem 3.12 still apply for infinitely differentiable flat-top function, and therefore we still suggest to employ an estimator θ n = θ 0 + O p (h 2 ) so that the bias at boundary has order O p (h 2 ).
By Remark 4.3, the target designed for transformation-based infinitely differentiable flat-top estimator is relaxed to

Remark 4.4:
The definition of H(f ) implies Lemma 4.1 does not work for a density function which is infinite at boundary, or has any order derivative that is infinite at boundary.
The following Theorem shows transformation-based infinitely differentiable flat-top estimator improves bias at interior area and remains order of bias h 2 at boundary. The proof is omitted because it is the same logic as the proof for Theorem 3.14, which employed Corollary 3.5 and Lemma 3.13. The idea of Theorem 4.5 is also employing Corollary 3.5 and Lemma 3.13 for boundaries, and employing Corollary 3.5 and Lemma 4.1 for interior area.

Theorem 4.5: Assume Condition 3.2 holds, f X (x) is the density function on [0, 1] and f X (0) = f X (1). Moreover, f X (x) is rth continuously differentiable and bounded function on
[0, 1]. If θ n = θ 0 + O p (h 2 ), then for any x ∈ (0, 1), the Transformed Infinitely Differentiable Flat-top Series estimator satisfies: In addition, if x = 0 or 1, then Remark 4.6: Notice that both rectangular flat-top estimator and infinitely differentiable flat-top estimator might produce negative density values; this is a common finite-sample issue for all higher order kernels that goes away asymptotically. In practical use, we suggest to reset all negative value to zero and subsequently re-normalise the density estimator so that it has area one, see also Politis and Romano (1993). In Sections 7 and 8, we use this correction method for both simulated and real data.

Polynomial transformation
Here and in what follows, we are more interested in the maximum possible reduction in the bias, and therefore focus on the Transformed Infinitely Differentiable Flat-top Series estimator defined in the previous section. In this section, we propose to find candidate of transformation function from the polynomial function family. By Lemma 4.1, recall the transformation target is just Let Y θ 0 = g θ 0 (X) be the desired transformation random variable. As before, we want to constraint g θ 0 to be a one-to-one function from [0, 1] to [0, 1], so that X = g −1 θ 0 (Y). Then by the theorem for derivative of inverse function: which implies A polynomial function and its coefficients will be selected by solving all the following equations: For example, if we consider a quadratic polynomial g θ 0 (x) = ax 2 + bx, then g θ 0 (0) = 0 is satisfied. We need a + b = 1 to make g θ 0 (1) = 1 holds. In addition, since f X (x) is always non-negative, the last equation implies and if both f X (0) and f X (1) are positive, By a simple calculation, . By property of convergence in probability, it is easy to see that if and letting g θ n (x) = a n x 2 + b n where a n =f then g θ n (x) = g θ 0 (x) + O p (h 2 ) for any fixed x.
Remark 5.1: In this paper, the simple boundary correction method by Jones (1993), available in R package evmix, is applied to estimate f X (0) and f X (1). This method is essentially equivalent to the kernel weighted local linear fitting at the boundary, thus implemented fast and is able to provide very accurate estimate at the boundaries.

Discussion of scenarios
Notice that Equation (30) can be solved only if f X (0) and f X (1) are both positive. On the other side, if any one of them is zero, then the solution equation (31) is invalid, and thus the quadratic polynomial transform introduced in previous section could be also invalid. We summarise all the scenarios below: (1) If both of them are zero, first-order polynomial function can be applied as transformation function. Solving equation array (29) always results in an identity function as the transformation function.
(2) If both of them are not zero, quadratic function can be applied as transformation function, as described in Section 5.1. (3) If only one of f X (0) and f X (1) is zero, a higher order polynomial function needs to be applied as transformation function. More details are as follows.
Therefore, if we ascertained if the density function has zero value at the boundaries, then we could select the order of the polynomial for the transformation. For the sake of simplicity, the simulation and real data analysis in this paper we judge if f X (0) and f X (1) are zero by visually inspecting histogram and using the estimated f X (0) and f X (1) by the method described in Remark 5.1.

Remark 5.2:
In real practice, we propose to use a nonparametric hypothesis testing method, which tests whether a density distribution has zero value at point of interest, by Paparoditis and Politis (2016b).

Remark 5.3:
For scenario (3) discussed above, without loss of generality, assume f X (1) > f X (0) = 0. Let Z = X k , where k > 1 is an integer, and first start with k = 2. Then we judge if f Z (0) = 0. If the conclusion is f Z (0) = 0, then we increase k by 1, until to a point f Z (0) = 0, denote this k by k 0 . Transformed quadratic polynomial flat-top series estimator could be now applied on datasets which is a decreasing function on [0, 1], thus the true density function is Remark 5.4: For the high order polynomial transformation function in Remark 5.3, recall (28), if k = k 0 , then One problem might arise when z (1−k 0 )/k 0 decay faster than f X (z 1/k 0 ) as z → 0, i.e.
which implies f Z (z) → ∞ as z → 0. In our simulations, the estimator still performs well even in this situation but this performance is not theoretically supported, and a small order of bias is not guaranteed.

Selection of shape parameters b and c
The function η ω given in (25) was first introduced by McMurry and Politis (2004) who suggested to use a relatively small value for b since a large b makes η ω more rectangularlooking which is undesirable, see Figure 1. Similarly we also find that a large value for c causes the same problem. In applications, the empirical choices b = 1/4 and c = 0.05 are recommended.

Selection of bandwidth h
The problem of the bandwidth selection over the whole compact support needs further study. The first question arises by Theorem 4.5, i.e. the bias orders at the boundary and the interior are different if r > 2. If a global bandwidth h 1 = An −1/(2r+1) is selected, where A 1 is a constant, then in the interior area we havê With same bandwidth h 1 , but at the boundaries, Compare to boundary correction kernel estimators introduced in Section 1 that achieve order of mean squared error (MSE) O p n −2/5 , MSE of Transformed Flat-top Series estimator at interior area reaches higher order, but worse order at two boundaries. If we select two different bandwidths, one for interior region and one for boundaries, then a discontinuity issue takes place at the border of these two regions.
Remark 6.1: Another possibility is defining a local bandwidth h = h(t), such that where γ h (t) a very smooth function we select to make h(t) continuous, further study on validity of this idea is needed. For simplicity, all the simulation and real data analysis in this paper, we employ bandwidth h interior that is optimal for the interior area. Note that selection of the bandwidth h interior is nontrivial, Politis (2003) showed the inverse of bandwidth of flat-top series estimator performs as a threshold in the frequency domain. If the periodic extensionf X (x) is very smooth, the characteristic function φ X (s) will decay to zero very fast, and be negligible when |s| > 1 h . The bandwidth thus should be chosen so that the flat-top taper leaves the characteristic function at low frequencies unchanged (due to flat-top region) while damping out the characteristic function at high frequencies. With this in mind, as in Politis (2003), we propose the following adaptive bandwidth choice rule of thumb.
Note that the adaptive bandwidth choice method might not always be the most appropriate for Transformed Flat-top Series estimator when the density function is not smooth enough. Moreover, the periodic extension of estimated transformed function f Y θn (y; g θ n ) is not really continuous, and thus its characteristic function might not always decay fast. As an alternative, we may estimate the bandwidth by least squares cross validation, i.e. choose the bandwidth that minimises The last term on the right-hand side does not depend on h; the first term is easy to compute because this integral is on closed interval [0, 1]; we will use the leave-one-out method to estimate the second term, i.e.
wheref −i Y θn is the flat-top series estimator computed by omitting observation y i from the sample.

Simulation
In this section, we simulate data from six different selected densities, see Table 1.
Beta(·, ·) is the Beta density distribution, Beta [a,b] (·, ·) denotes the corresponding Beta density function rescaled to interval [a, b]; 'truncated' means the original density function is truncated to interval [0, 1]. Figure 2 shows the shape of the six probability density functions as well as examples of a few corresponding Transformed Flat-top Series estimators. As mentioned in Section 5.2, for densities 1 and 6 we use the identity function as transformation function. For Densities 2 and 3, we use quadratic function as transformation function introduced in Section 5.1. For Densities 4 and 5, we use two higher order polynomials as transformation function discussed in Remark 5.3 (density 4 applied fourth-order polynomial and density 5 applied sixth-order polynomial).
We investigate the finite sample performance of proposed Transformed Flat-top Series estimator using Monte Carlo and compare its performance to other popular estimators of density with bound support. In addition, we apply re-normalisation to eliminate any negative value appeared in those estimator of density function, as we brought up earlier in Remark 4.6. The list of estimators are as follows:  (1)f Beta denotes the first beta kernel estimator of Chen (1999), and the bandwidth is estimated by rule of thumb bandwidth of Jones and Henderson (2007).
(2)f copula denotes the Gaussian copula based estimator of Jones and Henderson (2007), and the bandwidth is estimated by rule of thumb bandwidth of Jones and Henderson (2007).
(3)f TKE denotes transformed kernel estimator (TKE) of Wand et al. (1991), and the bandwidth is estimated by plug-in bandwidth of Wen and Wu (2015).
(4)f MTKE denotes modified transformed kernel estimator (MTKE) of Wen and Wu (2015), and the bandwidth is estimated by plug-in bandwidth of Wen and Wu (2015). For each density function, we conduct simulations with sample size n = 100 and replicate them 1000 times. Since TKE and MTKE sometimes explode at 0 and 1, we evaluate average of mean integrated square errors (AOMISE) of estimators on an equally spaced grid on [0.001, 0.999] with an increment 0.001. Figure 2 shows the first five estimators. The simulation results are presented in Table 2 where all the values have been multiplied by 1000 to be better readable. For each density, the minimum AOMISE is highlighted in bold font. Estimated standard errors of the listed AOMISE are given in parentheses. The simulation results are consistent with the theoretical analysis. Among all the estimators, flat-top estimators provide best AOMISE for four out of six densities when n = 100, and they are Densities 1, 2, 3, 6 (see Table 2 numbers in bold). All these four densities either satisfy f X (0) = f X (1) or f X (0) = f X (1) > 0. In these four densities, adaptive bandwidth is powerful as in the previous literature on flat-top based estimators. However, density 2 only has up to second continuous derivative which limits the order of bias that adaptive bandwidth choice method can achieve; also see Politis (2003). Although cross-validation performs in a stable manner, it might not be able to yield h = A 1 n −1/5 and is implemented much slower.
For both densities 4 and 5, that Beta kernel estimator provides the best AOMISE. Transformation Based Flat-top Estimator with adaptive bandwidth choice and beta kernel give second best AOMISE and work reasonably well. We recommend that the selection of high order polynomial function as transformation function introduced in Section 5.2 deserves more future research work.

Real data examples
We are looking into two different styles of data, one seems has a density distribution with no boundary effect and the other one has a density distribution with very significant boundary effect. We evaluate Transformed Flat-top Series estimators on those data by comparing to other competitor estimators.

Data without boundary effect: financial returns
In this section, we focus on three representative datasets of daily relative returns taken from a foreign exchange rate, a stock price, and a stock index; a description of our main datasets is as follows. Letting P t denote the price of an asset at time t, the relative return at time t is defined as X t = (P t − P t−1 )/P t−1 . Note that X t ≥ −1 by construction since the price P t cannot be negative. In principle, X t is a weakly dependent and strictly stationary time series random variable, see chapter 10 in Politis (2015), thus its density estimation has the same largesample variance and bias as i.i.d. random variable, see Adams and Nobel (1998) and Hallin and Tran (1996). The theorems introduced in this paper are based on variance and bias calculation, therefore Transformed Flat-top Series estimator can be applied to financial returns. In addition X t is not bounded above; however, when X t takes on extreme values, it is typically due to crashes in which case the extreme value is towards negative values; see, for example, the extreme value of about −20% associated with the crash of 1987 in the middle of Datasets 2 and 3. Since extreme values are in the negative direction (although it is the bounded direction), it is safe to assume that the positive direction satisfies the symmetric bound, i.e. that X t ≤ 1. Therefore, in what follows we assume that X t has density f X (x) over the compact support [−1, 1]. We also judge for all three datasets f X (−1) = f X (1) = 0 by visual inspection on the histogram. Plots and further details on the three datasets can be found in Politis (2015). We apply the same kernel estimators and bandwidth selection as the methods applied in Section 7. Since sample sizes of all three datasets are over 1000, all methods are supposed to have small mean integrated squared errors. If we were to plot the estimated densities, it would be difficult visually to tell the differences among these methods. Thus we propose a k-fold cross validation method to compare methods effectively; the idea is as follows: (1) Randomly cut original dataset to K non-overlapping subsets of roughly equal size. Call each subset a partition.
(2) Use data from partition number i (say) as 'training data' to estimate the density function, and use the rest K−1 partitions as testing data; more precisely, use the histogram of the data from all remaining K−1 partitions as the 'true' density function that gives the benchmark to compare the estimated density from partition number i. (3) Estimate MISE (mean integrated squared error) by using estimated density function from training data and histogram as true density function. (4) Repeat Steps 1-3 for all possible i and obtain K MISE; calculate AOMISE (average of mean integrated squared error) by taking average of K MISE.
For the above-mentioned histogram, we take the number of bins M equal where R is a fixed range of the data (for all three datasets here, R = 2), s is the sample standard deviation, N is the sample size of the data to be used; for more details, see Scott (1979). Then, our optimal histogram is built by nine partitions of data and becomes the reference close to truth. For Transformed Flat-top Series estimator, we apply identity function as transformation function. Similar as Section 7, two different bandwidth selection methods were chosen to be applied, the adaptive and cross-validation. An example of Transformed Flat-top Series estimators from 1 partition of Dataset 1 versus histogram of 9 partitions of Dataset 1 is plotted in Figure 3. Notice the density estimator has been renormalised to eliminate negative values described in Remark 4.6. Figure 4 shows the same example but fit by Beta, Gaussian copula, TKE and MTKE. Since all these four methods were designed to handle data with unit support, we normalise all three datasets to make them all fall in unit interval and then transform their density estimator back to original support. All these methods are working well with the exception of the Beta kernel estimator that seriously underestimates the magnitude of the peak. Table 3 gives the results of the 10-fold cross validation. As already mentioned even by visual inspection, the Beta estimator is the worst. The flat-top estimatorf abc is the best with Dataset 1 (foreign exchange) and close to the best with Dataset 3 (IBM stock price); the flattop estimator that uses cross-validated bandwidth is also competitive. Note that the TKE and MTKE estimators perform both well in Table 3. Nevertheless, as mentioned earlier, TKE and MTKE will explode and even produce infinite value at the two boundaries of support, even the boundaries are apparently 0. This explosion is not reflected on AOMISE vs. histogram comparison because the histogram never has a bin centred at the boundary.

Data with boundary effect: ratio of white student enrolment
In this section, we evaluate transformed flat-top series estimator by applying to the data set which records the ratio of white student enrolment of 56 public schools in Nassau County, New York for the 1992-1993 school year, and consequently restricted on unit interval. It was an example for illustrating boundary problem by Simonoff (1996), and used to compare among different methods by Geenens (2014) and Wen and Wu (2015). In addition, it  is interesting to assess if public schools were segregated by race in the 90s by estimating the density of white student enrolment ratios. We apply the same kernel estimators and bandwidth selection from Section of simulation and plot them in Figures 5 and 6. The histogram of the data is also presented in the same plots. This data set has mean 0.778, standard deviation 0.249, median 0.867, 80% of the data are greater than 0.707. It is apparently the density function at 0 and 1 are not 0 based on histogram, thus a quadratic polynomial function is estimated as transformation function. Notice the result we present here is after re-normalisation described in Remark 4.6. Figure 6 shows the fit by Beta, Gaussian copula, TKE and MTKE. Both Beta and Gaussian copula fail to fit and underestimate the shape of peak. TKE and MTKE perform both good, especially at the area Transformed Flat-top Series estimator is 0 due to re-normalisation, but did explode and produce infinite value at the two ends of unit support.

Conclusion
A Transformed Flat-top Series estimator is proposed for density function in compact support. This method is developed from boundary correction method, transformed method and Flat-top series estimator. We also establish the theoretical properties and show its higher order of bias in the interior region of the support. We use adaptive bandwidth choice and cross-validation as for bandwidth selection and present their performance in the simulation, and compare them to four other popular estimators for density with compact support by simulation and empirical example. In summary, the Transformed Infinite Differentiable Flat-top Series estimator have the best AOMISE among the selected other methods for density with compact support. For densities with same values or positive values at two boundaries, it did not reach to the same high performance for densities with zero at only one boundary. The development of the proposed estimators on density with a pole or infinite derivative, and extension to multivariate density problem will be pursued in our future work.

Disclosure statement
No potential conflict of interest was reported by the author(s).