Table Of ContentTheAnnalsofStatistics
2010,Vol.38,No.1,352–387
DOI:10.1214/09-AOS721
(cid:13)c InstituteofMathematicalStatistics,2010
MAXIMUM SMOOTHED LIKELIHOOD ESTIMATION AND
SMOOTHED MAXIMUM LIKELIHOOD ESTIMATION IN THE
0
CURRENT STATUS MODEL
1
0
2 By Piet Groeneboom, Geurt Jongbloed and Birgit I. Witte
n
Delft University of Technology, Delft University of Technology and
a
J Delft University of Technology
2
Weconsidertheproblemofestimatingthedistributionfunction,
1
the density and the hazard rate of the (unobservable) event time in
the current status model. A well studied and natural nonparametric
]
T estimator for the distribution function in this model is the nonpara-
S metricmaximumlikelihoodestimator(MLE).Westudytwoalterna-
. tivemethodsfortheestimationofthedistributionfunction,assuming
h
some smoothness of the event time distribution. The first estimator
t
a is based on a maximum smoothed likelihood approach. The second
m
method is based on smoothing the (discrete) MLE of the distribu-
[ tion function. These estimators can be used to estimate the density
and hazard rate of the event time distribution based on the plug-in
1
principle.
v
9
2 1. Introduction. Insurvivalanalysis,oneisinterestedinthedistribution
8
of the time it takes before a certain event (failure, onset of a disease) takes
1
. place. Depending on exactly what information is obtained on the time X
1
and the precise assumptions imposed on its distribution function F , many
0 0
0 estimators for F0 have been defined and studied in the literature.
1 When a sample of X ’s is directly and completely observed, one can es-
i
:
v timate F0 under various assumptions. In the parametric approach, one as-
i sumesF tobelongtoaparametricclassofdistributions,e.g.,theexponential-
X 0
or Weibull distributions. Then estimating F boils down to estimating a
r 0
a finite-dimensional parameter and a variety of classical point estimation pro-
cedures can be used to do this. If one wishes to estimate F fully non-
0
parametrically, so without assuming any properties of F other than the
0
Received October 2008; revised March 2009.
AMS 2000 subject classifications. Primary 62G05, 62N01; secondary 62G20.
Key words and phrases. Currentstatusdata,maximumsmoothedlikelihood,smoothed
maximum likelihood, distribution estimation, densityestimation, hazard rate estimation,
asymptotic distribution.
This is an electronic reprint of the original article published by the
Institute of Mathematical Statistics in The Annals of Statistics,
2010,Vol. 38, No. 1, 352–387. This reprint differs from the original in pagination
and typographic detail.
1
2 P. GROENEBOOM, G. JONGBLOED ANDB. I. WITTE
basic properties of distribution functions, the empirical distribution func-
tion F of X ,...,X is a natural candidate to use. If the distribution
n 1 n
function is known to have a continuous derivative f w.r.t. Lebesgue mea-
0
sure, one can use kernel estimators [see, e.g., Silverman (1986)] or wavelet
methods [see, e.g., Donoho and Johnstone(1995)] for estimating f .Finally,
0
in case F is known to satisfy a certain shape constraint as concavity or
0
convex-concavity on [0, ), a shape-constrained estimator for F can be
0
∞
used. Problems of this type were considered in, e.g., Bickel and Fan (1996),
Groeneboom, Jongbloed and Wellner (2002) and Du¨mbgen and Rufibach
(2009).
However, in many cases the variable X is not observed completely, dueto
some sort of censoring. Parametric inference in such situations is often not
really different from that based on exactly observed X ’s. The parametric
i
model for X basically transforms to a parametric model for the observable
data and the usual methods for parametric point estimation can be used to
estimate F . For various types of censoring, also nonparametric estimators
0
have been proposed. In the context of right-censoring, the Kaplan–Meier
estimator [see Kaplan and Meier (1958)] is the (nonparametric) maximum
likelihood estimator of F . It maximizes the likelihood of the observed data
0
over all distribution functions, without any additional constraints. Density
estimators also exist in this setting, see, e.g., Marron and Padgett (1987).
Huang and Zhang(1994)considertheMLEforestimatingF anditsdensity
0
in this setting under the assumption that F is concave on [0, ).
0
∞
Thetype of censoringwe focuson in this paper,is interval censoring, case
I.Themodelforthis typeof observations is also knownas thecurrent status
model.Inthismodel,acensoringvariableT,independentofX,isobservedas
well as a variable ∆=1 , indicating whether the (unobservable) X lies
{X≤T}
totheleftortotherightoftheobservedT.Forthismodel,the(nonparamet-
ric) maximum likelihood estimator is studied in Groeneboom and Wellner
(1992). Thisestimator is discreteand istherefore notsuitableforestimating
thedensityf ,thehazardrateλ =f /(1 F )orthetransmissionpotential
0 0 0 0
−
whichdependonthehazardrate λ studiedinKeiding(1991).Anestimator
0
that can be used to estimate these quantities is the maximum likelihood es-
timator studied by Du¨mbgen, Freitag-Wolf and Jongbloed (2006) under the
constraint that F is concave or convex-concave.
In this paper, we study two likelihood based estimators for F (and its
0
density f and hazard rate λ ) based on interval censored data from F un-
0 0 0
dertheassumptionthatF iscontinuouslydifferentiable.Thefirstestimator
0
we study is a so-called maximum smoothed likelihood estimator (MSLE) as
studied by Eggermont and LaRiccia (2001) in the context of monotone and
unimodal density estimation. It is a general likelihood-based M-estimator
thatwillturnouttobesmoothautomatically. Thesecond estimator wecon-
sider, the smoothed maximum likelihood estimator (SMLE), is obtained by
MSLE ANDSMLE IN THE CURRENTSTATUSMODEL 3
convolving the (discrete) MLE of Groeneboom and Wellner (1992) with a
smoothing kernel. These different methods result in different but related es-
timators. Analyzing the pointwise asymptotics shows that only the biases of
theseestimators differwhilethevariances areequal. We cannotsay that one
estimator is uniformly superior to the other. In a somewhat analogous way,
Mammen(1991)studiesthedifferencesbetweentheefficiencies ofsmoothing
of isotonic estimates and isotonizing smooth estimates. This also does not
produce a clear “winner.”
The outline of this paper is as follows. In Section 2, we introduce the cur-
rent status model and review some results needed in the sequel. The MSLE
FˆMS for F based on current status data is introduced and characterized
n 0
in Section 3. Moreover, asymptotic results are derived for FˆMS as well as
n
its density fˆMS and hazard rate λˆMS, showing that the rate of convergence
n n
of FˆMS is faster than the rate of convergence of the MLE. In Section 4,
n
the SMLE for F , f and λ are introduced and their asymptotic proper-
0 0 0
ties derived. The resulting asymptotic distributions are very similar to the
asymptotic distributions of the MSLE. In Section 5, we briefly address the
problem of bandwidth selection in practice. We also apply these methods to
a data set on hepatitis A from Keiding (1991). Technical proofs and lemmas
can be found in the Appendix.
2. Thecurrentstatusmodel. Considerani.i.d.sequenceX ,X ,...with
1 2
distributionF on[0, ) andindependentofthisani.i.d.sequenceT ,T ,...
0 1 2
∞
from a distribution G with Lebesgue density g on [0, ). Based on these
∞
sequences, define Z = (T ,1 ) =: (T ,∆ ). Then Z ,Z ,... are i.i.d.
i i {Xi≤Ti} i i 1 2
and have density f with respect to the product of Lebesgue- and counting
Z
measure on [0, ) 0,1 :
∞ ×{ }
f (t,δ)=g(t) δF (t)+(1 δ)(1 F (t))
Z 0 0
{ − − }
(2.1)
=δg (t)+(1 δ)g (t).
1 0
−
One usually says that the X ’s take their values in the hidden space [0, )
i
∞
and the Z take their values in the observation space [0, ) 0,1 .
i
Let P be the empirical distribution of Z ,...,Z . Wr∞itin×g {down} the log
n 1 n
likelihood as a function of F and dividing by n, we get
(2.2) l(F)= δlogF(t)+(1 δ)log(1 F(t)) dP (t,δ).
n
{ − − }
Z
Here, we ignore a term in the log likelihood that does not depend on the
distribution function F.
In Groeneboom and Wellner (1992), it is shown that the(nonparametric)
maximum likelihood estimator (MLE) is well defined as maximizer of (2.2)
over all distribution functions and that it can be characterized as the left
4 P. GROENEBOOM, G. JONGBLOED ANDB. I. WITTE
derivative of the greatest convex minorant of a cumulative sum diagram.
To be precise, the observed time points T are ordered in increasing order,
i
yielding T <T < <T , and the ∆ associated with T is denoted
(1) (2) (n) (i)
···
by ∆ . Then the cumulative sum diagram consisting of the points
(i)
i
i 1
P =(0,0), P = , ∆
0 i n n (j)
(cid:18) j=1 (cid:19)
X
is constructed. Having determined the greatest convex minorant of this dia-
gram, Fˆ (T ) is given by the left derivative of this minorant, evaluated at
n (i)
the point P . At other points it is defined by right continuity. Denoting by
i
G the empirical distribution function of the T ’s and by G the empirical
n i n,1
subdistribution function of the T ’s with ∆ =1, observe that for 0 i n,
i i
P =(G (T ),G (T )). Also note that Fˆ is a step function of wh≤ich≤the
i n (i) n,1 (i) n
set of jump points τ ,...,τ is a subset of the set T :1 i n .
1 m i
{ } { ≤ ≤ }
Groeneboom and Wellner (1992) show that this MLE is a consistent es-
timator of F , and prove that under some local smoothness assumptions,
0
for t>0 fixed, n1/3(Fˆ (t) F (t)) has the so-called Chernoff distribution as
n 0
−
limitingdistribution.IfF andGareassumedtosatisfyconditions(F.1)and
0
(G.1) below Groeneboom and Wellner (1992) also prove (see their Lemma
5.9 and page 120)
(2.3) F Fˆ = (n−1/3logn),
0 n ∞ p
k − k O
(2.4) max τ τ = (n−1/3logn).
i+1 i p
1≤i≤m| − | O
(F.1) F has bounded support =[0,M ] and is strictly increasing on
0 0 0 0
S S
with density f , strictly staying away from zero.
0
(G.1) G has support =[0, ), is strictly increasing on with density
G 0
S ∞ S
g staying away from zero and g′ is bounded on .
0
S
From this, it follows that for fixed t>0, any ν>0 and =[t ν,t+ν]
t
I −
(2.5) sup F (u) Fˆ (u) = (n−1/3logn),
0 n p
| − | O
u∈It
(2.6) max τ τ = (n−1/3logn).
i+1 i p
i:τi∈It| − | O
IfoneiswillingtoassumesmoothnessonF andusethisintheestimation
0
procedure, this cube-root-n rate of convergence of the estimator can be
improved. The two estimators of F we define, do indeed converge at the
0
faster rate n2/5.
MSLE ANDSMLE IN THE CURRENTSTATUSMODEL 5
3. Maximum smoothed likelihood estimation. In this section, we define
the maximum smoothed likelihood estimator (MSLE) FˆMS for the unknown
n
distribution function F of the variable of interest X. We characterize this
0
estimator as the derivative of the convex minorant of a function on R and
derive its pointwise asymptotic distribution. Based on FˆMS, estimators for
n
the density f as well as for the hazard rate λ =f /(1 F ) are defined
0 0 0 0
−
and studied asymptotically.
We start with defining the estimators. Define the empirical subdistribu-
tion functions based on the T ’s with ∆ =0 and 1, respectively, by
j j
n
1
G (t)= 1 (T ,∆ ) for i=0,1,
n,i [0,t]×{i} j j
n
j=1
X
and note that the empirical distribution of the data Z =(T ,∆ ):1 j
j j j
{ ≤ ≤
n can beexpressed as dP (t,δ)=δdG (t)+(1 δ)dG (t). Let Gˆ and
n n,1 n,0 n,1
} −
Gˆ be smoothed versions of G and G , respectively (e.g., via kernel
n,0 n,1 n,0
smoothing), let gˆ and gˆ be their densities w.r.t. Lebesgue measure on
n,1 n,0
[0, ) anddefinedPˆ (t,δ)=δdGˆ (t)+(1 δ)dGˆ (t).Thisis asmoothed
n n,1 n,0
ver∞sion of the empirical measure P , where−smoothing is only performed “in
n
the t-direction.” Following thegeneralapproach of Eggermont and LaRiccia
(2001), we replace the empirical distribution P in the definition of the log
n
likelihood (2.2) by this smoothed version Pˆ , and define the smoothed log
n
likelihood on the class of all distribution functions by
lS(F)= δlogF(t)+(1 δ)log(1 F(t)) dPˆ (t,δ)
n
{ − − }
(3.1) Z
= log(1 F(t))dGˆ (t)+ logF(t)dGˆ (t).
n,0 n,1
−
Z Z
The maximizer of the smoothed log likelihood is characterized similarly
as the maximizer of the log likelihood. The nexttheorem makes this precise.
Theorem 3.1. Define Gˆ (t)=Gˆ (t)+Gˆ (t) for t 0 and consider
n n,0 n,1
the following parameterized curve in R2, a continuous cum≥ulative sum dia-
+
gram (CCSD):
(3.2) t (Gˆ (t),Gˆ (t)),
n n,1
7→
for t [0,τ], with τ =sup t 0:gˆ (t)+gˆ (t)>0 . Let FˆMS(t) be the
∈ { ≥ n,0 n,1 } n
right-continuous slope of the lower convex hull of the CCSD (3.2), evaluated
at the point with x-coordinate Gˆ (t). Then FˆMS is the unique maximizer
n n
of (3.1) over the class of all sub-distribution functions. We call FˆMS the
n
maximum smoothed likelihood estimator of F .
0
6 P. GROENEBOOM, G. JONGBLOED ANDB. I. WITTE
Inthe proofof Theorem 3.1,we usethefollowing lemma, a proofof which
can be found in the Appendix.
Lemma 3.2. Let FˆMS be defined as in Theorem 3.1. Then for any dis-
n
tribution function F,
logF(t)dGˆ (t) FˆMS(t)logF(t)dGˆ (t)
n,1 ≤ n n
Z Z
and
log(1 F(t))dGˆ (t) (1 FˆMS(t))log(1 F(t))dGˆ (t)
− n,0 ≤ − n − n
Z Z
with equality in case F =FˆMS.
n
ProofofTheorem3.1. UsetheequalitypartofLemma3.2torewrite
(3.1) as
lS(FˆMS)= (FˆMS(t)logFˆMS(t)+(1 FˆMS(t))log(1 FˆMS(t)))dGˆ (t).
n n n − n − n n
Z
By the inequality part of Lemma 3.2, we get for each distribution function
F that
lS(F) FˆMS(t)logF(t)dGˆ (t)+ (1 FˆMS(t))log(1 F(t))dGˆ (t).
≤ n n − n − n
Z Z
Now note, using the convention 0 =0, that for all p,p′ [0,1]
·∞ ∈
(3.3) plogp′+(1 p)log(1 p′) plogp+(1 p)log(1 p).
− − ≤ − −
This implies that lS(F) lS(FˆMS), i.e., lS is maximal for FˆMS.
≤ n n
For uniqueness, note that inequality (3.3) is strict whenever p′=p. The
6
laststepintheprecedingargumentthenshowsthatlS(F)<lS(FˆMS),unless
n
F =FˆMS a.e. w.r.t. the measure dGˆ . It could be that dGˆ has no mass
n n n
on [a,b] for some a<b, i.e., (Gˆ (t),Gˆ (t))=(Gˆ (a),Gˆ (a)) for all t
n n,1 n n,1
∈
[a,b]. This means that FˆMS is constant on [a,b]. Furthermore, it holds that
n
F(a)=FˆMS(a) and F(b)=FˆMS(b), implying that F is also constant and
n n
equal to FˆMS on [a,b] a.e. w.r.t. the Lebesgue measure on [0, ). Hence,
n ∞
lS(F)<lS(FˆMS) unless F =FˆMS. (cid:3)
n n
We assume the estimators Gˆ are continuously differentiable, hence,
n,i
FˆMS is continuous and its derivative exists. So we can define the maximum
n
smoothed likelihood estimators for f and λ by
0 0
d fˆMS(t)
(3.4) fˆMS(t)= FˆMS(u) , λˆMS(t)= n
n du n n 1 FˆMS(t)
(cid:12)u=t − n
(cid:12)
(cid:12)
(cid:12)
MSLE ANDSMLE IN THE CURRENTSTATUSMODEL 7
for t>0 such that FˆMS(t)<1.
n
InTheorem3.1noparticularchoiceforGˆ andGˆ wasmade.Forwhat
n,0 n,1
follows, we define these estimators explicitly as kernel smoothed versions of
G and G . Let k be a probability density satisfying condition (K.1).
n,0 n,1
(K.1) The probability density k has support[ 1,1], is symmetric and twice
continuously differentiable on R. −
Note that condition (K.1) implies that m (k)= u2k(u)du< .
2
∞
t
LetK bethedistributionfunctionwithdensityk,i.e.,K(t)= k(u)du,
R −∞
k′ be the derivative of k and h>0 bea smoothing parameter (dependingon
R
n). Then we use the following notation for the scaled version of K, k and k′
1 1
(3.5) K (u)=K(u/h), k (u)= k(u/h) and k′(u)= k′(u/h).
h h h h h2
For i=0,1 let
gˆ (t)= k (t u)dG (u)
n,i h n,i
−
Z
be kernel (sub-density) estimates based on the observations T for which
j
∆ =i, and let gˆ (t)=gˆ (t)+gˆ (t). Also define the associated (sub-)
j n n,1 n,0
distribution functions
Gˆ (t)= gˆ (u)du, for i=0,1, and Gˆ (t)= gˆ (u)du.
n,i n,i n n
Z[0,t] Z[0,t]
Because X 0, we can expect inconsistency problems for the kernel den-
≥
sity and density derivative estimators at zero. In order to prevent those, we
modify the definition of gˆ for t<h. To be precise, we define
n,i
1 t u
gˆ (t)= kβ − dG (u), 0 t h,
n,i n,i
h h ≤ ≤
Z (cid:18) (cid:19)
for β=t/h where the so-called boundary kernel kβ is defined by
ν (k) ν (k)u
kβ(u)= 2,β − 1,β k(u)1 (u)
ν (k)ν (k) ν (k)2 (−1,β)
0,β 2,β 1,β
−
β
with ν (k)= uik(u)du,i=0,1,2.
i,β
Z−1
Let the estimators gˆ′ be the derivatives of gˆ , for i=0,1. There are other
n,i n,i
ways to correct the kernel estimator near the boundary, see, e.g., Schuster
(1985) or Jones (1993). However, simulations show that the results are not
much influenced by the used boundary correction method.
Having made these choices for the smoothed empirical distribution Pˆ ,
n
let us return to the MSLE. It is the maximizer of lS over the class of all
8 P. GROENEBOOM, G. JONGBLOED ANDB. I. WITTE
(a) (b)
Fig. 1. A part of the CCSD, itslower convex hull andthe estimates Fˆnaive and FˆMS for
n n
F based on simulated data, with n=500. (a) Part of the CCSD (grey line) and its lower
0
convex hull (dashed line); (b) estimates Fˆnaive (grey line) and FˆMS (dashed line) of F
n n 0
(dotted line).
distribution functions. One could also maximize lS over the bigger class of
all functions, maximizing the integrand of (3.1) for each t separately. This
results in
(3.6) Fˆnaive(t)= gˆn,1(t), fˆnaive(t)= gˆn(t)gˆn′,1(t)−gˆn′(t)gˆn,1(t),
n gˆ (t) n gˆ (t)2
n n
where
(3.7) gˆ′(t)=gˆ′ (t)+gˆ′ (t).
n n,0 n,1
Wecallthesenaiveestimators,sincefˆnaive mighttakenegativevalues,mean-
n
ing that Fˆnaive decreases locally.
n
Figure1(a)showsapartoftheCCSDdefinedin(3.2)anditslowerconvex
hull. Figure 1(b) shows thenaive estimator Fˆnaive (the grey line), the MSLE
n
FˆMS and the true distribution for a simulation of size 500. The unknown
n
distributionofthevariableX istakentobeashiftedGamma(4)distribution,
(x−2)3
i.e., f (x)= exp( (x 2))1 (x), and the censoring variable T has
0 3! − − [2,∞)
an exponential distribution with mean 3, i.e., g(t)= 1exp( t/3)1 . For
3 − [0,∞)
the kernel density, we took the triweight kernel k(t)= 35(1 t2)31 (t)
32 − [−1,1]
and as bandwidth h=0.7. This picture shows that the estimator FˆMS is the
n
isotonic version of the estimator Fˆnaive.
n
Thenexttheoremshowsthatforappropriatelychosenh,thenaiveestima-
tor Fˆnaive will be monotonically increasing on big intervals with probability
n
MSLE ANDSMLE IN THE CURRENTSTATUSMODEL 9
converging to one as n tends to infinity if F and G satisfy conditions (F.1)
0
and (G.1).
Theorem 3.3. Assume F and G satisfy conditions (F.1) and (G.1).
0
Let gˆ and gˆ be kernel estimators for g and g with kernel density k
n n,1 1
satisfying condition (K.1). Let h=cn−α (c>0) be the bandwidth used in
the definition of gˆ and gˆ . Then for all 0<m<M <M and α (0,1/3)
n n,1 0
∈
the following holds
(3.8) P(Fˆnaive is monotonically increasing on [m,M]) 1.
n −→
NotethatthistheoremasitstandsdoesnotimplythatFˆMS(t)=Fˆnaive(t)
n n
on [m,M] with probability tending to one. Some additional control on the
behaviorofFˆnaive on[0,m)and(M,M ]isneeded.Theproofofthecorollary
n 0
below makes this precise.
Corollary 3.4. Under the assumptions of Theorem 3.3, it holds that
for all 0<m<M <M and α (0,1/3),
0
∈
(3.9) P(Fˆnaive(t)=FˆMS(t) for all t [m,M]) 1.
n n ∈ −→
Consequently, forallt>0theasymptotic distributionsofFˆMS(t)andFˆnaive(t)
n n
are the same.
In van der Vaart and van der Laan (2003), a result similar to our Corol-
lary 3.4 is proved for smooth monotone density estimators. The kernel esti-
mator is compared with an isotonized version of this estimator. Their proof
is based on a so-called switch-relation relating the derivative of the convex
minorant of a function to that of an argmax function. The direct argument
we use to prove Corollary 3.4 furnishes an alternative way to prove their
result.
By Corollary 3.4, the estimators FˆMS(t) and Fˆnaive(t) have the same
n n
asymptotic distribution. The same holds for fˆMS(t) and fˆnaive(t) as well as
n n
for λˆMS(t) and λˆnaive(t). The pointwise asymptotic distribution of Fˆnaive(t)
n n n
follows easily from the Lindeberg–Feller central limit theorem and the delta
method. The resulting pointwise asymptotic normality of both FˆMS(t) and
n
Fˆnaive(t) is stated in the next theorem.
n
Theorem 3.5. Assume F and G satisfy conditions (F.1) and (G.1).
0
Fix t>0 such that f′′ and g′′ exist and are continuous at t and g(t)f′(t)+
0 0
2f (t)g′(t)=0. Let h=cn−1/5 (c>0) be the bandwidth used inthe definition
0
6
of gˆ and gˆ . Then
n n,1
n2/5(FˆMS(t) F (t)) (µ ,σ2 ),
n − 0 N F,MS F,MS
10 P. GROENEBOOM, G. JONGBLOED ANDB. I. WITTE
where
1 f (t)g′(t)
µ = c2m (k) f′(t)+2 0 ,
F,MS 2 2 0 g(t)
(cid:26) (cid:27)
F (t)(1 F (t))
σ2 =c−1 0 − 0 k(u)2du.
F,MS g(t)
Z
This also holds if we replace FˆMS by Fˆnaive.
n n
For fixed t>0, the asymptotically MSE-optimal bandwidth h for FˆMS(t)
n
is given by h =c n−1/5, where
n,F,MS F,MS
1/5
F (t)(1 F (t))
c = 0 − 0 k(u)2du
F,MS
g(t)
(3.10) (cid:26) Z (cid:27)
f (t)g′(t) 2 −1/5
m2(k) f′(t)+2 0 .
× 2 0 g(t)
(cid:26) (cid:26) (cid:27) (cid:27)
Proof. Forfixedc>0,theasymptoticdistributionofFˆnaive followsim-
n
mediately byapplyingthedeltamethodwithϕ(u,v)=v/utothefirstresult
in Lemma A.3. By Corollary 3.4, this also gives the asymptotic distribution
of FˆMS.
n
To obtain the bandwidth which minimizes the asymptotic mean squared
error (aMSE) we minimize
1 f (t)g′(t) 2
aMSE(FˆMS,c)= c4m2(k) f′(t)+2 0
n 4 2 0 g(t)
(cid:26) (cid:27)
F (t)(1 F (t))
+c−1 0 − 0 k(u)2du
g(t)
Z
with respect to c. This yields (3.10). (cid:3)
Remark 3.1. In case g(t)f′(t)+2f (t)g′(t) = 0, the optimal rate of
0 0
h is n−1/9 resulting in a rate of convergence n−4/9 for FˆMS. This is in
n,F,MS n
line with results for other kernel smoothers in case of vanishing first-order
bias terms.
Thepointwise asymptotic distributionsof fˆMS(t) and fˆnaive(t) also follow
n n
from the Lindeber–Feller central limit theorem and the delta method.
Theorem 3.6. Consider fˆMS as defined in (3.4) and assume F and G
n 0
satisfy conditions (F.1) and (G.1). Fix t>0 such that f(3) and g(3) exist
0
and are continuous at t. Let h=cn−1/7 (c>0) be the bandwidth used to
define FˆMS. Then
n
n2/7(fˆMS(t) f (t)) (µ ,σ2 ),
n − 0 N f,MS f,MS