Boundary-adaptive kernel density estimation: the case of (near) uniform density

We consider nonparametric kernel estimation of density functions in the bounded-support setting having known support $ [a,b] $ [a,b] using a boundary-adaptive kernel function and data-driven bandwidth selection, where a and b are finite and known prior to estimation. We observe, theoretically and in finite sample settings, that when bounds are known a priori this kernel approach is capable of outperforming even correctly specified parametric models, in the case of the uniform distribution. We demonstrate that this result has implications for modelling a range of densities other than the uniform case. Furthermore, when bounds $ [a,b] $ [a,b] are unknown and the empirical support (i.e. $ [\min (x_i),\max (x_i)] $ [min(xi),max(xi)]) is used in their place, similar behaviour surfaces.


Introduction
Nonparametric kernel estimators are quite popular in applied settings, though the dominant approach involves the use of what we shall refer to as infinite support kernel functions such as the Gaussian kernel.However, it is known that this dominant approach suffers from so-called boundary bias in certain settings.These undesired boundary effects occur when the curve to be estimated has a discontinuity at the boundary, thus the usual bias expansion that depends on smoothness assumptions is no longer valid.The upshot of this is that the bias of f (x) may be larger near the boundary than in the interior.Technically, for some point x lying within h units of the boundary (h is the bandwidth), the boundary bias of f (x) is of O(1) (see p. 31 of Li and Racine 2007) rather than O(h 2 ) for the interior points, hence, f (x) may be inconsistent for f (x) near the boundary.Unfortunately, the practitioner may be unaware that bounds are present which can seriously degrade the reliability of the resulting estimate (i.e. the practitioner may be unaware that X is constrained to lie in [a, b] with f (a) > 0 and/or f (b) > 0 and thereby ignores their presence).
Sometimes bounds are naturally known a priori (e.g.variables reported as percentages and proportions are known to lie in [0, 100] and [0, 1]), and in such cases practitioners can turn to so-called boundary kernel functions which seek to overcome these limitations by exploiting the presence of the bounds.In such instances, the bias at the boundary can be reduced to the same order as that in the interior.
Many readers familiar with kernel-based methods are no doubt aware that the bandwidth, a regularisation parameter that plays a key role in kernel estimation, usually vanishes asymptotically in order for the estimator to be consistent (i.e.h → 0 as n → ∞, n denotes the sample size).However, this need not always the case, and a surprising but welcome result surfaces when the data distribution is uniform (or close to uniform), in which case the bandwidth need not vanish asymptotically, instead, h → ∞ as n → ∞ would lead to a more accurate density estimate than allowing h to decrease as n increases.The intuition is that, when X is uniformly distributed, a slightly modified kernel density estimator is unbiased for any h, and larger h leads to smaller estimation variance, hence, smaller estimation mean squared error (MSE) because the estimator is unbiased for any h.
Similar results have appeared in conditional settings such as when estimating a conditional density or conditional mean in the presence of an irrelevant predictor that is incorrectly included in the model.For example, Hall, Racine, and Li (2004) and Hall and Racine (2015) demonstrated a provocative feature of kernel-based conditional density and regression, respectively, i.e. the ability of certain data-driven methods of bandwidth selection to behave in surprising but welcome ways in the presence of irrelevant predictors when their relevance or irrelevance is not known a priori hence they are unnecessarily included in the set of model predictors (in the current setting this forms the counterpart to support bounds being present but whose presence is not known a priori).Specifically, for any infinite support kernel function (such as the popular Gaussian kernel), when the bandwidth for an irrelevant predictor becomes sufficiently large this result holds.The intuition underlying this result is straightforward: with regards to the irrelevant predictor there is no bias induced therefore any bandwidth can be used without asymptotic consequence from the bias perspective.However, as the bandwidth for the irrelevant predictor increases, the overall variability of the resulting estimate falls, from a MSE perspective h → ∞ as n → ∞ for the irrelevant predictor, which is welcome as this may automatically remove the irrelevant predictor from the model without the need for pre-testing.We shall see below that such surprising but welcome behaviour is not limited to conditional settings and can be exploited in unconditional settings provided a simple modification of the kernel function is adopted.
Boundary bias with known bounds has been addressed in the kernel density literature, and it arises mostly when there is a discontinuity of the density function at a support boundary.To restore estimation consistency when x lies in the boundary region, there are two types of boundary correction kernels: (i) a simple modification, such as multiplying the density estimator by a normalisation constant, that reduces boundary bias from O(1) to O(h); and (ii) a more sophisticated modification which further reduces boundary bias to O(h 2 ), the same order of bias as an interior point (See Jones 1993;Jones and Foster 1996).A number of procedures have been proposed to achieve (ii), with the most well-known being data-reflection, data-transformation and the use of kernel carpentry.Data-reflection, as its name implies, involves duplicating symmetrically (i.e.reflecting) data around its boundary, running standard bandwidth selection and kernel estimation, then adjusting the resulting estimate to ensure it is proper (i.e.non-negative and integrates to one) in its support (Schuster 1985;Silverman 1986;Cline and Hart 1991).Data-transformation involves some mathematical transform of the data that, when rescaled, has the desired effect (Wand, Marron, and Ruppert 1991;Marron and Ruppert 1994).Kernel carpentry, on the other hand, uses boundary kernel functions that adapt to the presence of a boundary thereby mitigating its impact.Many boundary kernel functions have been proposed including Beta kernels (Chen 1999;Bouezmarni and Rolin 2003;Zhang and Karunamuni 2010;Igarashi 2016) (Chen 2000;Malec and Schienle 2014), and inverse Gaussian and reciprocal inverse Gaussian kernels (Scaillet 2004;Igarashi and Kakizawa 2014), by way of illustration.To some degree, all methods (i.e.reflection, transformation, carpentry) can reduce the amount of bias that would otherwise be present near a boundary to that which holds in the interior of support where it is free from boundary effects (in effect lying h or greater distance from the boundary in the interior).
Existing boundary correction density estimators mainly deal with non-uniform density functions.A particular feature of the non-uniform density function is its non-vanishing second derivative function which is often imposed during theoretical analysis, e.g. for bias calculations.In this paper we focus on data that is (or is close to) uniformly distributed over a bounded interval [a, b], where a < b are known finite constants.We propose a simple boundary correction kernel which has an attractive property that, when data is uniformly distributed, the resulting kernel density estimator is unbiased for any value of n ≥ 1 and h > 0 and for all points in the data support, be it a boundary point or an interior point.Therefore, our method complements the existing literature in using boundary correction density estimators in applications.We suggest using the least-squares cross-validation (LS-CV) method to select the bandwidth.When data is uniformly distributed, we study the large sample statistical properties of this data-driven method for selecting the bandwidth showing that there is a positive probability the chosen bandwidth diverges to ∞ as n → ∞.Simulations show that our proposed estimator performs well when data is (or is close to) uniformly distributed.In applications, data support may not always be known.We then suggest using min 1≤i≤n x i and max 1≤i≤n x i to estimate a and b, respectively.We investigate, via simulations, a surprising but welcome property that, when the distribution is (close to) uniform, h → ∞ as n → ∞ leading to a more accurate density estimate than allowing h to decrease as n increases.This property arises by using a simple and flexible boundary kernel function combined with LS-CV bandwidth selection.We demonstrate, theoretically and in finite sample settings, that the proposed method is capable of outperforming even correctly specified parametric models in certain settings.In particular, we show that the closer the underlying density is to a uniform distribution on [a, b], the closer the resulting estimator is to the unknown true density both at the boundary and in the interior.We demonstrate that this result has implications when modelling a range of densities, not just the uniform.The resulting support agnostic estimator may be preferred to the traditional infinite support estimator or to existing methods that rely on known support bounds a priori (i.e.reflection, transformation, or carpentry).Rather than placing the burden of estimator choice on the shoulders of the practitioner, the resulting estimator is support-adaptive while at the same time delivering consistent estimates in the presence or absence of discontinuities at the support boundary.
Our finding is in contrast to Stone (1984) who uses a regular kernel function with a bounded support without boundary correction, and shows that the LS-CV selected h → 0 as n → ∞ whether the true density is uniform or not.The finite sample implication of our result is that, if the underlying density is close to being uniformly distributed, using a boundary correction kernel and LS-CV is likely to lead to finite sample efficiency gains over the case of using a regular kernel function without boundary correction.This arises because a larger h should be used the closer the underlying density f (x) is to a uniform density function.
The remainder of this paper proceeds as follows: Section 2 offers a brief review of kernel density estimation and addresses the case when the underlying density is uniform on [a, b].We then propose a simple boundary-correction kernel density estimator.Section 3 studies the large sample statistical properties of the LS-CV selected bandwidth when the data is uniformly distributed.This section also presents two empirical examples to demonstrate the usefulness of our proposed method in applied settings.Section 4 examines the finite sample performance of our proposed estimator via simulations.Detailed proofs are relegated to Web Appendix A while detailed tabular summaries of the Monte Carlo simulations are relegated to Web Appendix B.

Boundary correction
The Rosenblatt-Parzen kernel density estimator (Rosenblatt 1956;Parzen 1962) is the most popular smooth nonparametric density estimator that is in use today.In this paper, we consider iid data {x i } n i=1 .The classical Rosenblatt-Parzen kernel density estimator is defined by It it typically assumed that h → 0 and nh → ∞ as n → ∞, and that the kernel function K(z) is symmetric, bounded and non-negative.Boundary effects occur when the density that is to be estimated has a discontinuity at an endpoint, and hence the usual bias expansion that depends on smoothness assumptions is no longer valid.The upshot is that the bias of f (x) is larger near the boundary.In particular, for x lying within h units of the boundary, the bias of f (x) is of O(1) (see p. 31 of Li and Racine 2007) rather than O(h 2 ).Furthermore, f (x) may be inconsistent for f (x) near the boundary.
But even when using boundary kernels, it is often believed and assumed (or required) that h → 0 as n → ∞ for consistency to hold.In this paper we address the issue of whether or not a surprising but welcome result might hold in this setting.It turns out that a provocative result holds analogous to that observed in Hall et al. (2004) and Hall and Racine (2015) when the underlying density is uniform.This case is of interest for a number of reasons, but one practical reason is that the uniform data generating process (DGP) is a worst case scenario of sorts for kernel density estimation.That is, using a standard kernel density estimator when the data is uniform leads to unwanted artefacts and higher bias than when the DGP has exponential tails (e.g. the Gaussian).This can be appreciated by considering the behaviour of the R function density(), which we use to generate Figure 1 using the default plug-in bandwidth selector and Gaussian kernel function.
Figure 1 reveals some artefacts of interest, namely a divergence of the estimate from the DGP that increases as one approaches the boundaries a = 0 and b = 1 (this is only one sample mind you so this is not a measure of true bias though a simple Monte Carlo simulation would reveal this to be the case).What occurs is essentially that, by using a conventional kernel function that ignores the support bounds, probability mass spills outside the support bounds as is clearly revealed in Figure 1.As a consequence, the estimate is improper on its support (i.e.b a f (x) dx < 1).Suppose that we have univariate iid data {x i } n i=1 with a bounded support and a probability density function (PDF) f (x).Without loss of generality, we assume that the data support is [a, b], where −∞ < a < b < ∞ are finite known constants.A boundary-corrected kernel density estimator is given by where ds is the cumulative distribution function (CDF) associated with the PDF K(•).This is closely related to the idea of re-weighting to restore missing probability mass, which was used by Diggle (1985) in the context of estimating the local intensity of a point process (he refers to this as 'endcorrection'; see Diggle 1985, eq.(1.1)).It is also discussed in Härdle (1990, p. 131) and Jones (1993).The difference is that here we perform the adjustment at the level of the kernel function rather than in an ex-post fashion after the density estimate has been constructed, although both approaches have the same effect.Figure 2 presents this kernel function (divided by h) for X ∈ [0, 1] when it is constructed using 5 values of its mode, denoted by x (i.e. the 5 kernel functions K(z) plotted correspond to x = (0, 1/4, 1/2, 3/4, 1) for X ∈ [0, 1] where z = (X − x)/h).
Figure 3 presents the (true) uniform density and the corrected estimates for the same data underlying Figure 1 based on an ad hoc bandwidth of h = 0.75.It can be seen that the boundary correction results in a much more suitable estimate in this case relative to the uncorrected estimate presented in Figure 1.Furthermore, unlike the conventional estimate presented for the same data in Figure 1, the proposed estimate presented in Figure 3 is proper on its support (i.e.b a f (x) dx = 1).But note that, here, we are using an ad hoc bandwidth, not a data-driven one.

Main results
In this section, we focus on the case in which data is uniformly distributed and we study the large sample statistical properties of the least squares cross-validation (LS-CV) selected bandwidth.Two empirical examples demonstrate the usefulness of our proposed method in applied settings.

Main results
We consider standard LS-CV for bandwidth selection.However, unlike the conventional treatment that delivers a bandwidth that converges to the optimal bandwidth in probability having a rate of O(n −1/5 ), we demonstrate that if the underlying density is in fact uniform on [a, b] a, b].This corresponds to a scenario where, in empirical applications, n is always a fixed positive integer.When one searches for a value for h, h is not tied with n.We can consider as large an h as our computer allows us.For example, we may have a data set with range in [0, 1], and h can be 10 6 or even larger, though, practically, there is no noticeable difference between using h = 100 and h = ∞ if the data support is [0, 1].Furthermore, it will be seen that when we deviate slightly from the uniform density case, the kernel estimator can still exhibit good finite sample properties.
Before we present our main theoretical results, we first make some assumptions.Let It is easy to check that the standard Gaussian kernel satisfies Assumption 3.2.Assumption 3.3 allows h → 0 at a rate slower than n −1/2 ; it also allows for h → ∞ with n fixed; or h → ∞ as n → ∞, in this latter case, the rate h → ∞ can be faster or slower than n.Lemmas A.3, A.4, A.6, A.7 and Proposition 3.1 all require Assumption 3.3 to hold.Therefore, h = h n is a sequence of positive real numbers in these Lemmas and Proposition 3.1.However, Lemmas 3.2 and A.1 hold when h → ∞ for any value of n (n can be fixed).Hence, h is unrelated to n in Lemmas 3.2 and A.1.Lemmas A.1-A.8 are provided in the Web Appendix A.
We next show that, when ] and recalling that K(•) and G(•) are the PDF and CDF, respectively, so that dG(( Given the unbiasedness property of f (x) when x is uniform, it is reasonable to conjecture that some existing data-driven methods of bandwidth selection may be able to select a large value of h, at least with a positive probability, even when n is not large (a non-asymptotic behaviour result).Simulation results confirm this conjecture.The intuition behind this result is fairly straightforward, and goes as follows.Many data-driven methods of bandwidth selection are predicated on the minimisation of an L 2 norm criteria such as integrated mean square error.As well, the estimator's variance depends inversely on the bandwidth h.So when minimising certain criteria for bandwidth selection for some DGP such as the uniform, if for any h there is no bias then certain criteria may end up minimising estimator variance which is accomplished by selecting large values of h.
Next, we evaluate pointwise estimation MSE when The proof of Lemma 3.2 is given in the Web Appendix A. Lemma 3.2 suggests that one can obtain a √ n-consistent estimator of f (x) when x i is uniformly distributed.However, it is an open question whether one can design a data-driven method to select h such that the probability that h → ∞ tends to one (as n → ∞) when x i is uniformly distributed.To the best of our knowledge, all existing data-driven methods can pick up a large value of h with a positive probability, but the probability is strictly less than one even as n → ∞.
We now consider a data-driven method for selecting the bandwidth.The LS-CV is a fully automatic and data-driven method of selecting the bandwidth (Rudemo 1982;Bowman 1984).This method is based on the principle of selecting a bandwidth that minimises the integrated square error (ISE).Specifically, we select h as a minimiser of the following minimisation problem (see p. 15 of Li and Racine 2007), where f−i (x i ) is the leave-one-out estimator of f (x i ) and the term in the bracket is defined as CV(h).Proposition 3.1 below presents the statistical property of the LS-CV selected ĥ.
Proposition 3.1: Under Assumptions 3.1-3.3,we have where A(h) has a unique minimum at h = ∞, i.e. for all The proof of Proposition 3.1 is given in the Web Appendix A. Let h denote the value of h that minimises 1 (b−a) 2 A(h) + Z n (h).Because Z n (h) is a zero mean random variable that is highly non-linear in h, it is difficult to characterise the limiting distribution of h, however we can get some intuitive conjecture/reasoning about the statistical behaviour of h: (i) because A(h) is minimised at h = ∞, there is a tendency for h to assume a large value, or in other words, we expect that there is a positive probability that h takes very large values.For example, when [a, b] = [0, 1], it makes little to no noticeable difference for the estimated nonparametric density whether h ≥ 100 (say, h = 100) or h = ∞.Moreover, this may occur even for small values of n (a non-asymptotic behaviour of h); (ii), h will not diverge to ∞ with probability approaching one as n → ∞ because the zero mean, finite variance term Z n (h) will prevent h from diverging to ∞ with probability approaching one as n → ∞.Therefore, while there is a positive probability that h takes very large values, there is also a positive probability that h takes values in a finite interval [c, d] for some finite positive constants 0 < c < d < ∞; (iii) there is no probability mass at 0, that is, P( h ∈ [0, δ)) → 0 as δ ↓ 0. Simulation results reported in Section 4 strongly supthis theoretical analysis.It seems extremely challenging, if not impossible, to derive a specific (a known) distribution theory for h in this instance.Finally, the large sample behaviour of ĥ and h are expected to be similar (when n → ∞) given that h minimises a leading term of CV(h).
As a referee correctly pointed out, in a classical setting the optimal bandwidth for kernel density estimation can also be derived assuming that the density f is twice differentiable (and not equal to zero).Using a second order kernel, the optimal global and local bandwidths are Obviously for a uniform density the first and second derivatives are equal to zero everywhere.Note that if R(f ) is close to zero then the denominator in h opt would be close to zero, leading to a larger value of the bandwidth.It is thus not surprising (intuitively) that a method that estimates the MISE, as cross-validation does, leads to a very large bandwidth in the uniform density case, given that the density is constant on its entire domain.This is just another explanation/intuition, similar to the one provided by the LS-CV method discussed above.
Choosing the kernel function K(•) as a standard normal density function, Figure 4 plots the function A(h) versus h for h ∈ (0, 0.5] (values for h > 0.5 are essentially constant taking on the value 3.55).

Applications
We consider two illustrative applications involving lifetime distributions where the random variable is time-to-failure, and a Gaussian kernel is used throughout.The first example (Example 1) is for time-to-failure (hours) for n = 18 observations on complete failure for electrical components with the maximum being 420 h hence the cutoff at 420 (Wang, Sha, Gu, and Xu 2014), the second (Example 2) for time-to-failure (hours) for n = 50 observations on complete failure for nano ceramic capacitors tested under accelerated temperature and voltage stress conditions in order to generate more failure data within a short period of time with a maximum recorded value of 1770 hence the cutoff at 1770 (Kalaiselvan and Rao 2016).Results are presented in Figure 5 where the plots on the left are the lifetime density estimates with empirical support [min x , max x ] versus traditional infinite support [−∞, ∞] both using cross-validated bandwidth selection (here we define min x = min 1≤i≤n x i and max x = max 1≤i≤n x i ).The plots on the right present the associated hazard functions Ĥ density estimates we also present density histograms, and LS-CV bandwidth values appear as subtitles below each horizontal axis.
First, in Figure 5 one can see that the conventional kernel estimator underestimates f (x) significantly such that the estimated density integrates to a number less than one.Next, we can observe that cross-validated bandwidth selection chooses a moderately large value of the bandwidth for Example 1 and an extremely large bandwidth for Example 2, the former leading to a close-to-uniform estimate and the latter to a uniform estimate.Figure 5 reveals that, for Example 1, we appear to have a close-to-uniform case using the empirical support [min x , max x ] for support bounds, while for Example 2 we appear to have a uniform case again using the empirical support for support bounds.A Kolmogorov-Smirnov goodness-of-fit test conducted under the null of uniformity on the empirical support was computed for Example 1 and Example 2 which yielded P-values of 0.55 and 0.85, respectively, which are consistent with the null that the data is drawn from a uniform distribution in both examples.The implications for the associated hazard functions are markedly different depending on whether one uses empirical support [min x , max x ] versus traditional infinite support [−∞, ∞].Furthermore, the smooth empirical support estimates are in broad agreement with the non-smooth histogram density estimator, while the traditional infinite support estimates appear to display boundary effects that are at odds with the data.

Simulation study
In what follows we compare the MSE performance of the boundary-adaptive kernel estimator outlined above that uses LS-CV bandwidth selection with its unbounded support counterpart (also estimated using LS-CV bandwidth selection, denoted by K [−∞,∞] in the tables below), and a Gaussian kernel is used throughout.We consider both the case where bounds are known (denoted by K [0,1] in the tables below) and unknown but based on the empirical support (denoted by K [min x ,max x ] in the tables below).In order to provide a parametric benchmark, we also consider a correctly specified parametric model with shape parameters s 1 and s 2 obtained via MLE (denoted by Beta(s 1 , s 2 ) in the tables below).
We compute MSE via where n is the sample size, f (x i ) an estimator evaluated at the ith sample realisation, and f (x i ) the true density evaluated at the ith sample realisation which, here, is known since we simulate the data from a known DGP.As well, RMSE that appears in some figures denotes the root-MSE and is computed via The probability density function of the beta distribution, for 0 ≤ x ≤ 1, and shape parameters s 1 , s 2 > 0, is a power function of the variable x and of its reflection (1 − x) and is defined as follows: where (•) is the Gamma function and B(s 1 , s 2 ) is a normalisation constant ensuring that the total probability equals 1.Of particular interest in what follows is the case where s 1 = s 2 = 1 which delivers the uniform distribution, though we consider a range of values for s 1 and s 2 .Figure 6 presents the DGPs used in the simulations that follow.

The uniform case (Beta(1, 1))
In this section we consider the finite sample behaviour of the LS-CV bandwidths and estimator MSE when the data is simulated from the uniform distribution.Table 1 presents a summary of quantiles for a range of sample sizes of the LS-CV bandwidths for the Beta(1, 1) (uniform[0, 1]) case for the three kernel estimators, (i) that uses a boundary kernel (K [0,1] ), (ii) that uses a empirical support boundary kernel (K [min x ,max x ] ) and (iii)    that uses an unbounded kernel (K [−∞,∞] ).Table 2 presents a summary of quantiles of the MSEs (×10,000) for the boundary kernel cases (K [0,1] and K [min x ,max x ] ) versus the correctly specified parametric model (Beta(s 1 , s 2 )) with parameters estimated via MLE.Table 1 illustrates that for the boundary kernel K [0,1] , the bandwidths behave in a manner consistent with our theoretical analysis (i.e.yielding a large median bandwidth) delivering an estimator with a median MSE of virtually zero in this instance when bounds are known (see Table 2), unlike that for its infinite support counterpart boundary kernel 2 reveals that LS-CV bandwidth selection in conjunction with an appropriate boundary kernel can outperform a correctly specified parametric model (Beta(s 1 , s 2 ) distribution estimated via MLE).Note that column 2 of Table 2 reports MSE (×10,000) when the bounds are known (K [0,1] ).In this case, when the bandwidth gets large the kernel estimator converges rapidly to the true underlying density, i.e. f (x) → f (x) = 1/(b − a) = 1.In fact, the median bandwidth is very large in this case as Table 1 reveals, which is why the median MSE (×10,000) quantiles contain many zeros below the 80th quantile in Table 2 (i.e.where bandwidths are quite large above their 20th quantile in Table 1).Even when bounds are not known (i.e.column 3), these results indicate that the empirical support case has strong appeal for practitioners in instances where bounds may be suspected but unknown.
Table 1 also shows that when using an infinite support kernel function, the LS-CV bandwidth h shrinks to 0 as n → ∞ even when the true density is uniform, which confirms the theoretical analysis of Stone (1984).may find surprising.Figure 7 plots the median RMSEs over the M = 1000 Monte Carlo replications (vertical axis) versus the sample size (horizontal axis).
To best appreciate results in Figure 7, we are comparing median RMSE (defined above) on the vertical axes versus sample size n on the horizontal axis.If a median RMSE curve for one method lies everywhere above that for another, then its median RMSE performance is uniformly worse (i.e. the method associated with the lower curve would be preferred on median RMSE grounds).Summarising results in Figure 7, perhaps surprisingly it appears that we can outperform a correctly specified parametric model estimated using MLE by using a bounded but otherwise unrestricted kernel density estimator when the underlying density lies at or near the uniform (e.g.top row of plots).Furthermore, the further we move from the Beta(1, 1) DGP (i.e. as both s 1 and s 2 increase beyond the value 1) the closer the boundary-adaptive kernel density gets to the unbounded case.As well, the bottom row of plots in Figure 7 reveals that once the bounded DGP achieves tails that are 'flat' and essentially zero (e.g.Beta(3, 3)), then the boundary-adaptive and unbounded kernel estimators coincide as we might expect and, in this case, the correctly specified parametric model dominates as expected.
In addition, we would like to also note that for infinite support DGPs such as the Gaussian, the reader can trivially confirm that the relative MSE performance of the empirical support (K [min x ,max x ] ) versus infinite support (K [−∞,∞] ) estimators is negligible in all but the smallest of sample sizes.For instance, simulations reveal that the (median) relative efficiency of the empirical support versus infinite support estimators for sample sizes n = (100,200,400,800,1600,3200) is (1.06, 0.98, 0.98, 0.99, 1.00, 1.00) (numbers >1 indicate superior performance of the infinite support estimator).For data simulated from a χ 2 ν distribution with ν = 1 degrees of freedom the (median) relative efficiency of the empirical support versus infinite support estimators for sample sizes n = (100, 200, 400, 800, 1600, 3200) is (0.91, 0.93, 0.95, 0.96, 0.97, 0.98).If bounds are unknown it appears that use of the empirical support estimator can lead to small efficiency losses in small sample settings such as n = 100 when in fact the infinite support assumption is warranted (e.g.Gaussian).However, if bounds are unknown the efficiency gains outweigh such losses as the left-bounded χ 2 1 simulation reveals.The take home message from this simulation experiment is that it provides clear and actionable support for using the proposed approach in practical settings in which support bounds are present but whose values are unknown to the practitioner a priori.Also, as would be expected, in cases where the bounds are known a priori one obviously ought to use this information (see, e.g.table columns labelled K [0,1] which correspond to known bounds [a, b] = [0, 1] used for the purpose of the simulation experiment).
The final point to be made is that when dealing with data for which support bounds may be present but unknown a priori, the proposed empirical support approach may be preferred for revealing features actually present in the data over that generated by the standard density estimator in R (density()).Given that we provide a simple R function for practitioners (see the R function npuniden.boundary()which can be found on CRAN in the np package (Hayfield and Racine 2008)), we encourage practitioners to confirm that this is indeed the case.

Summary
In this paper we present a modified Rosenblatt-Parzen density estimator when data has a bounded support [a, b] where a < b are known finite constants.We propose a simple boundary-adaptive density estimator.We suggest using the least square cross-validation method to select the bandwidth.We demonstrate theoretically and via simulations and applications that the proposed method is capable of delivering well-behaved estimates especially when data follows a (or is close to a) uniform distribution.One limitation of this paper is that we only provide the theoretical analyses when the support [a, b] is known.When the boundary points are unknown, we recommend using extreme values [min 1≤i≤n x i , max 1≤i≤n x i ] to estimate [a, b].We conjecture that our theoretical result can be extended to the case with unknown boundary points.Simulation results show that this approach delivers a density estimate that is close to the case when data support is known.Our proposed simple and tractable method could serve as a useful alternative to the traditional estimator that assumes support bounds are infinite, or to existing estimators that presume a priori known support bounds.A simple R function is available for practitioners called npuniden.boundary() in the np package (Hayfield and Racine 2008).

Figure 2 .
Figure 2. Boundary kernel function for a range of bandwidths.

Figure 3 .
Figure 3. Boundary correction via kernel carpentry with a uniform DGP, ad hoc bandwidth h = 0.75, random sample of size n = 100.

Figure 5 .
Figure 5. Lifetime distribution density and hazard functions for empirical support and traditional infinite support kernel functions.Bandwidths appear below the horizontal axes.