Table Of ContentA New Family of Asymmetric Distributions for Modeling
Light-Tailed and Right-Skewed Data
MeitnerCadena
DepartamentodeCienciasExactas,UniversidaddelasFuerzasArmadas-ESPE
7 Sangolquí,Ecuador
1
0
March28,2017
2
r
a
M Abstract
Anewthree-parametercumulativedistributionfunctiondefinedon(α, ),forsomeα 0,with
7 ∞ ≥
asymmetricprobabilitydensityfunctionandshowingexponentialdecaysatitsbothtails, isin-
2
troduced. Thenewdistributionisneartofamiliardistributionslikethegammaandlog-normal
distributions,butthisnewoneshowsownelementsandthusdoesnotgeneralizeneitherofthese
]
T distributions.Hence,thenewdistributionconstitutesanewalternativetofitvaluesshowinglight-
S tailedbehaviors.Further,thisnewdistributionshowsgreatflexibilitytofitthebulkofdatabytun-
. ingsomeparameters.Werefertothisnewdistributionasthegeneralizedexponentiallog-squared
h
distribution(GEL-S).StatisticalpropertiesoftheGEL-Sdistributionarediscussed.Themaximum
t
a likelihoodmethodisproposedforestimatingthemodelparameters,butincorporatingadaptations
m incomputationalproceduresduetodifficultiesinthemanipulationoftheparameters.Theperfor-
manceofthenewdistributionisstudiedusingsimulations. Applicationstorealdatasetscoming
[
fromdifferentdomainsareshowed.
2
Keywords:Asymmetricdistribution,Maximumlikelihoodmethod,Simulation,Light-tailed,
v
0 Right-skewed
8
8
4 1 Introduction
0
.
1
Inanumberofdomainssuchasmedicalapplications,atmosphericsciences,microbiology,environ-
0
7 mentalscience,andreliabilitytheoryamongothers,dataarepositive,right-skewed,withtheirhighest
1 valuesdecayingexponentially. Amongthemostsuitablemodelsusedbyresearchersandpractition-
: erstodealwiththiskindofdataareusuallyparametricdistributionsasthelog-normal,gammaand
v
i Weibulldistributions.However,knowndistributionsarenotalwaysenoughtoreachagoodfittothe
X
data. Thishasmotivatedtheinterestinthedevelopmentofmoreflexibleandbetteradapteddistri-
r butions,whichhavebeengeneratedusingdifferentstrategiessuchasthecombinationofknowndis-
a
tributions[40],introductionofnewparametersingivendistributions[31],transformationofknown
distributions[21],andjunctionoftwodistributionsbysplicing[41].
Inthispaper,anewproceduretodevelopnewdistributionsisproposed.Theaimistoguaranteethat
R
aprobabilitydensityfunction(pdf) f(x)definedforx α,forsomeα ,exponentiallydecaysto0
> ∈
asx α+andasx .Anadvantageofthisconditionisthatthisitselfstillholdsifanypolynomial
xβw→ithβ Risinc→lud∞edasafactorinsuchpdf. Inthisway,thenewdistributionwouldhavegreat
∈
flexibilityinneighborhoodsof0and bycontrollingβ,thuscapturingawidevarietyofshapesand
∞
tailbehaviors. Werefertothisnewdistributionasthegeneralizedexponentiallog-squareddistribu-
tion(GEL-S).
Theabove-mentionedfeaturesforpdfsarealsosatisfiedbythelog-normalandrelateddistributions.
Further,thelog-normaldistributionmaybeconsideredaparticularcaseofGEL-S,however,aswill
beseenlater,thelatterdoesnotgeneralizethelog-normaldistribution.
1
Theaimofthispaperistwo-fold. First,tostudystatisticalpropertiesoftheGEL-Sdistributionand
methodsforestimatingitsparameters.Second,toprovideempiricalevidenceonthegreatflexibility
oftheGEL-Sdistributiontofitreallight-tailed andright-skeweddatafromdifferentdomains. For
numericalassessments,theimplementationofthismodelisdoneusingfunctionsintheRsoftware
[37].
Inthenextsection,thepdfassociatedtothenewthree-parameterdistributionisintroducedbycon-
sideringtheconditiononpdfsindicatedabove,andexplicitexpressionsofitscumulativedistribution
function(cdf)andsurvivalfunction(sf)areprovidedinsomecases.Further,closenessofthenewdis-
tributionwithwell-knowndistributionsisdiscussed. Section3presentsstatisticalpropertiesofthe
newdistribution.Section4isdevotedtothemaximumlikelihoodmethodforestimatingtheparam-
etersofthenewdistribution. InSection5,theperformanceoftheparameterestimationmethodis
studiedusingsimulations.Section6showsapplicationsofthenewdistributionstorealdatasetscom-
ingfromdifferentdomains. Section7concludesthepaperpresentingdiscussionsandconclusions
andnextfurthersteps.Proofsarepresentedintheannexe.
2 TheGeneralizedExponentialLog-SquaredDistribution
Inthissection,theGEL-Sdistributionisintroduced.Thepdfofthenewcdfisdefinedby
f(x): Cxβe−(2γ2)−1(log(x−α))2, x α, withα 0,β Randγ 0,
= > ≥ ∈ >
whereC isthenormalizingconstant.
Thisfunctionholdsexponentialdecaysatitstails:writing
xβe−(2γ2)−1(log(x−α))2=e−(log(x−α))2µ(2γ2)−1−β(logl(oxg−xα))2¶
andnotingthat,ifα 0,
=
logx 1
lim lim 0,
x→α+ log(x−α) 2 =x→0+logx =
orifα 0, ¡ ¢
> logx 1
lim logα lim 0,
x→α+ log(x−α) 2 = ×x→α+ log(x−α) 2 =
and,byapplyingtheL’Hôpitalr¡ule, ¢ ¡ ¢
logx 1 1
lim lim 0,
x log(x α) 2 =2x log(x α) =
→∞ − →∞ −
R ¡ ¢
thenwehave,foranyβ ,
∈
lim f(x) 0, lim f(x) 0.
x→α+ = x→∞ =
Further,thismeansthatbothtailsofthisfunctionarelight[10,38],whichimpliesthat f reaches0
veryfastwhenx α+orx .
→ →∞
R
Duetodifficultiesinthemanipulationof f foranyβ ,forinstanceforcomputingintegralsofthis
∈
function,thisstudywaslimitedtocaseswhenβtakesnon-negativeintegervalues.Therefore,inthis
paper,thedefinitionofthepdfwillbeconsideredas
f(x): Cxke−(2γ2)−1(log(x−α))2, x α, withα 0,k 0,1,2,...,andγ 0, (1)
= > ≥ = >
whereC = γp2π ki 0 ki αk−ie(i+1)2γ2/2 −1 isthenormalizingconstant. ThedeductionofC ispre-
=
sentedinA³nnexeAP. ¡ ¢ ´
2
Thecdfisthen,forx α,
>
x
F(x) : f(z)dz
=
Zα
γCp2π k k αk−ie(i+1)2γ2/2Φ 1 log(x α) (i 1)γ2 , (2)
= i 0Ãi! µγ − − + ¶
X= ¡ ¢
whereΦisthe cdfofa standardnormal randomvariable(rv). The deductionofF ispresented in
AnnexeA.From(2)thesfF associatedtoF maybededucedbyusing itsdefinitionF : 1 F,but
= −
followingsimilarcomputationsasinthedeductionofF andusingtheproperty1 Φ(x) Φ( x),the
− = −
followingexpressionforF isobtained,forx α:
>
∞
F(x) f(z)dz
=
Zx
γCp2π k k αk−ie(i+1)2γ2/2Φ 1 log(x α) (i 1)γ2 .
= i 0Ãi! µ−γ − − + ¶
X= ¡ ¢
Relating f withthepdfofalog-normaldistributionwithparametersµandσ2,writing
xke−(2γ2)−1(logx)2 e2−1γ2(k+1)2x−1e−(2γ2)−1(logx−γ2(k+1))2,
=
givesthattheformerdistributionbecomesthelatteroneifα 0,γ σ,andγ µ (k 1). Hence,
= = = +
the log-normaldistribution is aparticular caseofthe GEL-S distribution, implyqing that the GEL-S
±
distributionmightthusinherittheimportancethatthelog-normaldistributionhastakentomodel
data[15,30]. However,thenewdistributionisnotanextensionofthelog-normaldistributionsince
thelatterisbuiltwhenconsideringthervlogX whereX isarvfollowinganormaldistribution,but
theintroductionofx ey inF(x)givesanexpressionthatisnotrelatedtoanyexpressionbasedon
=
normalrvs.Thereaderisreferredto[48,12,23,42]forfurtherdetailsonthelog-normaldistribution
anditsgeneralizations.
Asdiscussedabove,theGEL-Sdistributionisclosetothelog-normaldistribution. Otherpdfsthat
areclosetothenewpdfintermsoftheirstructuresarepresentedinTable1,wherethenewpdfisin-
cludedinordertoappreciatesimilaritiesanddifferencesamongthem.Throughthesepdfstwomain
functionsmultiplyingeachotherareidentified: onethatisbasedontheexponentialfunctionanda
secondfunctionformedbytheremainingpart.Accordingtothefunctionsbasedontheexponential
function, the GEL-S, two-parameter log-normal and three-parameter log-normal distributions are
verysimilartoeachother,whereasonthesecondfunctionstheGEL-Sandgammadistributionsare
closetoeachother.
Distribution Parameters Support pdf
(log(x α))2
GEL-S α 0,k 0,1,2,...,γ 0 x α Cxke− 2γ−2
≥ = > >
Two-parameterlog-normal µ R,σ 0 x 0 x−1 e−(log2xσ−2µ)2
∈ > > σp2π
Three-parameterlog-normal δ,µ R,σ 0 x δ (x−δ)−1e−(log(x2−σδ2)−µ)2
∈ > > σp2π
Gamma α,β 0 x 0 βα xα 1e βx
> > Γ(α) − −
Table1:DistributionsthatareclosetotheGEL-Sdistribution
PlotsofpdfsandcdfsoftheGEL-S,two-parameterlog-normalandthree-parameterlog-normaldis-
tributionsareexhibitedinFig. 1. Leftplotsconcernpdfsandrightplotstheircorrespondingcdfs.
Intopplots, theGEL-Sandtwo-parameterlog-normaldistributionsarecomparedbyvaryingtheir
parameters. Notethat thesupports ofthe positive partsofthe pdfsand cdfsarenotthe same for
bothdistributions: theoneofthelog-normaldistributionbeginsatx 0 andisingeneralslightly
+
=
3
widerthanthatoftheGEL-Sdistributionthatbeginsatx α 0. Intheseplots, thecdfandpdf
+
= >
ofthelog-normaldistributionsaresurroundedbytheonesoftheGEL-Sdistributions,reflectingthe
factthatthetwo-parameterlog-normaldistributionisaparticularcaseoftheGEL-Sdistributionas
discussedabove. ThisenclosureisdonebyvaryingγoftheGEL-Sdistribution. Inbottomplots,the
GEL-Sandthree-parameterlog-normaldistributionsarecomparedasinthepreviouscomparisons,
butconsideringthesamesupportforbothdistributionsbytakingα δ. Now,thecdfandpdfofthe
=
log-normaldistributionarepartiallysurroundedbytheonesoftheGEL-Sdistribution,namelyatthe
rightsideofthecurves.
0.4 GEL−S: a=0.01, k=0, g=0.95 1.0
GEL−S: a=0.01, k=0, g=1.05
2−parameter log−normal: m=1, s=1 GEL−S: a=0.01, k=0, g=0.95
0.3 0.8 G2−EpLa−raSm: ae=te0r. 0lo1g, −kn=o0r,m g=a1l: .m0=51, s=1
0.6
f(x) 0.2 F(x)
0.4
0.1 0.2
0.0 0.0
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
x x
0.4 GEL−S: a=0.01, k=0, g=0.95 1.0
GEL−S: a=0.01, k=0, g=1.05
3−parameter log−normal: d=0.01, m=1, s=1 GEL−S: a=0.01, k=0, g=0.95
0.3 0.8 G3−EpLa−raSm: ae=te0r. 0lo1g, −kn=o0r,m g=a1l: .d0=50.01, m=1, s=1
0.6
f(x) 0.2 F(x)
0.4
0.1 0.2
0.0 0.0
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
x x
Figure 1: Comparisons of pdfs (left plots) and cdfs (right plots) associated to GEL-S and two-
parameterlog-normal(topplots)andtoGEL-Sandthree-parameterlog-normal(bottomplots)dis-
tributions
Fig2presentscurvesofpdfsandcdfsofGEL-Sdistributionsbyvaryingparameters.Leftplotsconcern
pdfsandrightplotstheircorrespondingcdfs.Eachrowshowsplotswhereonlyoneparametervaries:
αfortopplots,kformiddleplots,andγforbottomplots.Theseplotsshowthattheincreaseofα,k,
orγalwayspromotetheflatteningofpdfs. Ontheotherhand,theincreaseofαshiftsthepdfsand
cdfstotherightwithslightincreasesintheheightsofthepdfs,whereastheincreaseofγincreasesthe
rightskewnessofthepdfs.
3 StatisticalPropertiesoftheGEL-SDistribution
Inthissection,statisticalpropertiesoftheGEL-Sdistributionarestudied. Tothisaim,hereafter, X
denotesarvfollowingaGEL-Sdistributionwiththeparametersα,k,andγ,andwiththepdff defined
in(1).
4
1.0 1.0
GEL−S: a=0.5, k=1, g=0.5
0.8 GGEELL−−SS:: aa==11..05,, kk==11,, gg==00..55 0.8
0.6 0.6
f(x) F(x)
0.4 0.4
0.2 0.2 GGEELL−−SS:: aa==01..50,, kk==11,, gg==00..55
GEL−S: a=1.5, k=1, g=0.5
0.0 0.0
1 2 3 4 5 1 2 3 4 5
x x
1.0 1.0
GEL−S: a=0.5, k=0, g=0.5
0.8 GGEELL−−SS:: aa==00..55,, kk==12,, gg==00..55 0.8
0.6 0.6
f(x) F(x)
0.4 0.4
0.2 0.2 GGEELL−−SS:: aa==00..55,, kk==01,, gg==00..55
GEL−S: a=0.5, k=2, g=0.5
0.0 0.0
1 2 3 4 5 1 2 3 4 5
x x
1.0 1.0
GEL−S: a=0.5, k=1, g=0.4
0.8 GGEELL−−SS:: aa==00..55,, kk==11,, gg==00..56 0.8
0.6 0.6
f(x) F(x)
0.4 0.4
0.2 0.2 GGEELL−−SS:: aa==00..55,, kk==11,, gg==00..45
GEL−S: a=0.5, k=1, g=0.6
0.0 0.0
1 2 3 4 5 1 2 3 4 5
x x
Figure 2: Comparisons of pdfs (left plots) and cdfs (right plots) of GEL-S distributions by varying
parameters(αintopplots,kinmiddleplots,γinbottomplots)
3.1 Mean,Variance,Skewness,Kurtosis,andMoments
Firstly,thenthmomentofX,n 0,1,2,...isdescribed,computationsarepresentedinAnnexeA:
=
E Xn : ∞xnf(x)dx Cγp2πn+k n+k αn+k−ie(i+1)2γ2/2, (3)
=Zα = i 0Ã i !
£ ¤ X=
whichmeansthatX hasallitsmoments. Fromthisexpression,importantstatisticsofX canbede-
duced,forinstancethemean
µX : E X Cγp2π1+k 1+k α1+k−ie(i+1)2γ2/2,
= = i 0Ã i !
£ ¤ X=
5
thevariance
σ2X :=E X2 − E X 2=Cγp2π2i+0kÃ2+i k!α2+k−ie(i+1)2γ2/2−µ2X,
£ ¤ ¡ £ ¤¢ X=
theskewness
SkewX :=E·µXσ−XµX¶3¸=Cγp2πPi3=+0k¡3+ik¢α3+k−σi3Xe(i+1)2γ2/2−3µXσ2X−µ3X,
andthekurtosis
KurtX :=E·µXσ−XµX¶4¸=Cγp2πPi4=+0k¡4+ik¢α4+k−ie(i+1)2γσ24X/2−4µXσ3XSkewX−6µ2Xσ2X−µ4X.
Tab.2illustratesthepreviousstatisticsbyconsideringtheGEL-SdistributionsshowninFig.2.These
resultsshowthattheincreaseofthemean, theskewness andthekurtosisarepromoted whenany
of the parameters α, k or γ increases, but for the varianceonly the increase ofk or γ promote its
increase.
Parameters µ σ2 Skew Kurt
X X X X
α 0.5,k 1,γ 0.5 2.26 0.92 1.78 9.08
= = =
α 1.0,k 1,γ 0.5 2.70 0.87 1.80 9.23
= = =
α 1.5,k 1,γ 0.5 3.16 0.84 1.81 9.33
= = =
α 0.5,k 0,γ 0.5 1.95 0.60 1.75 8.90
= = =
α 0.5,k 2,γ 0.5 2.67 1.46 1.80 9.21
= = =
α 0.5,k 1,γ 0.4 1.93 0.37 1.34 6.33
= = =
α 0.5,k 1,γ 0.6 2.79 2.41 2.31 13.68
= = =
Table2:StatisticsfortheGEL-SdistributionsshowninFig.2
3.2 Mode
Theexplicitexpressionoff givenby(1)allowstheanalysisofthemodex oftheGEL-Sdistribution.
m
Thisisgiveninthefollowingresult.
Proposition1. ThemodeoftheGEL-Sdistributionwithparametersα,kandγexists,isuniqueandis
thesolutionoftheequation
xlog(x α) kγ2(x α).
− = −
TheclaimonunicitygiveninthepreviouspropositionshowsthattheGEL-Sdistributionisalways
unimodal. Furthermore,fromtherelationshipgivenbythispropositionwehavethat,ifk 0,x
m
= =
1 α,withoutinfluenceofγ,whereasifk 0,from
+ >
x(x α)log(x α) kγ2(x α)2 0,
− − = − >
x 1 αfollows.
m
> +
IllustrationsofmodesarepresentedinTab. 3consideringtheGEL-SdistributionsshowninFig. 2.
Theircorrespondingmeansareincluded. Theseresultscorroboratetherelationsbetweenthemode
andαdeducedabove. Also,itisfoundthatthemodeisalwayslowerthanitscorrespondingmean,
whichisinlinewiththerightskewnessoftheGEL-Sdistribution.
6
Parameters µ x
X m
α 0.5,k 1,γ 0.5 2.26 1.69
= = =
α 1.0,k 1,γ 0.5 2.70 2.14
= = =
α 1.5,k 1,γ 0.5 3.16 2.61
= = =
α 0.5,k 0,γ 0.5 1.95 1.50
= = =
α 0.5,k 2,γ 0.5 2.67 1.95
= = =
α 0.5,k 1,γ 0.4 1.93 1.62
= = =
α 0.5,k 1,γ 0.6 2.79 1.80
= = =
Table3:MeansandmodesfortheGEL-SdistributionsshowninFig.2
3.3 QuantilesandRandomNumberGeneration
Thequantilefunctionq(p),0 p 1,isobtainedbysolving
< <
F q(p) p,
=
therefore,thus,fortheGEL-Sdistribution,this¡funct¢ionqcorrespondstothesolutionofthenonlinear
equation
γCp2π k k αk−ie(i+1)2γ2/2Φ 1 log(q(p) α) (i 1)γ2 p. (4)
i 0Ãi! µγ − − + ¶=
X= ¡ ¢
Since
F′(x) C 1 k k αk−ie(i+1)2γ2/2e−21 γ1(log(x−α)−(i+1)γ2)2 0, x α,
= x−αiX=0Ãi! ³ ´ > >
wehavethatthesolutionof(4)isunique.
IllustrationsofquantilesarepresentedinTab. 4. Tocomputequantiles,i.e. tosolve(4),thefunction
unirootintheRsoftwarepackagewasused. Thistableshowsthequantilewhen p 0.5, i.e. the
=
medianofX,x ,forthedistributionspresentedinFig. 2. MeanstakenfromTab. 2areincludedin
M
Tab. 4inordertocompareallthesestatistics. Thequantilesq(0.01),q(0.05),q(0.95)andq(0.99)are
alsoincorporatedtoTab.4,whichmaybeusedasriskmeasuresincontextslikeinsuranceorfinance
[4,9]. Theseresultsshowthatinallcasesthemediansarelowerthanthemeans,thismeansthatthe
bulkofdataisconcentratedtotheleftofthemeanwhichisinlinewiththerightskewnessofthistype
ofdistributions.Also,asexpected,q(p)isincreasinginpandq(0.01)isneartoα,whereasduetothe
rightskewnessoftheGEL-Sdistribution,thedifferencesbetweenq(0.05)andq(0.01)arelowerthan
theonesbetweenq(0.99)andq(0.95).
Parameters µ q(0.5)orx q(0.01) q(0.05) q(0.95) q(0.99)
X M
α 0.5,k 1,γ 0.5 2.26 2.05 0.97 1.17 4.08 5.56
= = =
α 1.0,k 1,γ 0.5 2.70 2.49 1.45 1.64 4.47 5.92
= = =
α 1.5,k 1,γ 0.5 3.16 2.95 1.94 2.12 4.89 6.31
= = =
α 0.5,k 0,γ 0.5 1.95 1.78 0.90 1.06 3.42 4.61
= = =
α 0.5,k 2,γ 0.5 2.67 2.40 1.06 1.30 4.96 6.83
= = =
α 0.5,k 1,γ 0.4 1.93 1.87 1.01 1.17 3.07 3.88
= = =
α 0.5,k 1,γ 0.6 2.79 2.40 0.95 1.18 5.72 8.42
= = =
Table4:MeansandquantilesfortheGEL-SdistributionsshowninFig.2
Thesolutionqof(4)givenp,0 p 1,couldbeusedtogeneraterandomnumbersofarvthatfollows
< <
aGEL-Sdistribution.Indeed,sinceF 0,the(non-explicit)functionF 1(p)isstrictlyincreasingand
′ −
>
then,theinversetransformsamplingmethodcanbeappliedtodrawrandomsamples. Thismethod
consistsin[16]
1. Generatearandomnumberpfromthestandarduniformdistributionintheinterval[0,1];and,
7
2. ComputeqsuchthatF(q) p,i.e.(4).
=
Theimplementationofthepreviousmethodmaybedonebygeneratingrandomnumbersfollowinga
uniformdistributionthatmaybeperformedusingthefunctionrunifintheRsoftwarepackage,and
thereafter, bycomputing quantiles thatmay beperformedusing thefunctionunirootmentioned
above.
Thisrandomnumbergenerationprocedurewillbeusedlateroninordertosimulaterandomnum-
bersfollowingaGEL-Sdistribution.Thesenumberswillbeusedtostudytheperformanceofthenew
distribution.
4 MaximumLikelihoodEstimation
Inthissection,themethodofmaximumlikelihoodforestimatingα,kandγoftheGEL-Sdistribution
isproposed.
LetX bearvfollowingaGEL-Sdistributionwithparametersα,kandγ,andletx ,...,x beasample
1 n
ofX obtainedindependently.Letθ (α,k,γ).
=
Followingthemethodofmaximumlikelihood,thelikelihoodfunctionofthisrandomsampleisthen
givenby
n
L(θ|x1,...,xn)= Cxike−(2γ2)−1(log(xi−α))2,
i 1
Y=
andthenitslog-likelihoodfunctionis
n 1 n
2
l(θx ,...,x ) nlogC k logx log(x α) .
| 1 n = + i−2γ2 i−
i 1 i 1
X= X= ¡ ¢
Maximumlikelihoodestimates(MLEs)ofα,kandγmightbereachedbysolvingthenon-linearsys-
temobtainedbyequalingto0thederivativesofl withrespecttoθ.Unfortunately,theparameterkis
notcontinuousandthussuchprocedurecannotbeapplied.
Inordertoovercomethisissue,thefollowingad-hocalternativetoreachthemaximumofl ispro-
posed. Fixingk 0,1,2,...,l ismaximizedbysearchingoptimalestimates αˆ(k)andγˆ(k). Then, k,
=
αˆ(k)andγˆ(k)areselectedastheonesthatmaximizel throughtherangeofvaluesk takenintoac-
count. Thisprocedureisequivalenttomaximizel throughoutthethreeparameters, butonlycon-
sideringafewvaluesfork.Then,followingthisproposedprocedureweneedtosolvethenon-linear
system,fixedk,
∂l 1 ∂C 1 n log(x α)
i
n − 0
∂α = C ∂α+γ2 x α =
iX=1 i−
∂l 1 ∂C 1 n
2
n log(x α) 0.
∂γ = C ∂γ +γ3 i− =
i 1
X= ¡ ¢
Therearenoexplicitsolutionsforthissystem. Amethodtonumericallysolvesuchasystemisthe
Newton-Raphson(NR)algorithm.Thisisawell-knownandusefultechniqueforfindingrootsofsys-
temsofnon-linearequationsinseveralvariables. ThefunctionnlmintheRsoftwarepackage,that
carriesoutaminimization ofanobjectivefunctionusingaNR-typealgorithm, isusedtosolvethe
systemdescribedabove.Tothisaim,thisfunctionisappliedtotheobjectivefunction l(θx ,...,x )
1 n
givenk,whichprovidesMLEsθˆofθ. − |
Alimitation ofthe function nlmisthat constraints arenotallowed. Thisisanissue forestimating
bothparameters αandγofaGEL-Sdistributionsinceαneedstobenon-negativeandγpositive,
i.e. negativevaluesasestimatesforαandγarenotallowed. Inpractice,applicationsofnlmtoget
estimates for α and γ showed that only the estimates of α could eventually be negative. In order
8
tocircumventthislimitation,thefollowingsimplemodificationofαintheobjectivefunctiontobe
minimizedcouldbeused:considerα2insteadofα.Thismeansthatαcouldbeestimatedbynegative
values,butthenthetruevalueforαispositivesinceitisequaltoα2.
Forintervalestimationof(α,γ)andhypothesistestsontheseparameters,weusethe2 2observed
×
informationmatrixgivenby,fixedk,
∂2l ∂2l
∂α2 ∂α∂γ
I(θ) E
=− ∂2l ∂2l
∂α∂γ ∂γ2
where
∂2l 1 ∂2C 1 ∂C 2 1 n log(x α) 1
i
n n −
∂α2 = C ∂α2 − C2 ∂α +γ2 (x α)2 −(x α)2
µ ¶ iX=1µ i− i− ¶
∂2l 1 ∂2C 1 ∂C ∂C 2 n log(x α)
i
n n −
∂α∂γ = C ∂α∂γ− C2 ∂α ∂γ −γ3 x α
iX=1 i−
∂2l 1 ∂2C 1 ∂C 2 3 n
2
n n log(x α) .
∂γ2 = C ∂γ2 − C2 ∂γ −γ4 i−
µ ¶ i 1
X= ¡ ¢
Under certain regularity conditions [14], the MLE θˆ given k approximates as n increases a multi-
variatenormaldistributionwithmeanequaltothetrueparametervalueθandvariance-covariance
matrixgivenbytheinverseoftheobservedinformationmatrix,i.e. Σ σ I 1(θ). Hence,the
ij −
= =
asymptoticbehavioroftwo-sided(1 ǫ)100%confidenceintervals(CIs)fortheparametersαandγ
− £ ¤
areapproximately
αˆ z σˆ , γˆ z σˆ
ǫ/2 11 ǫ/2 22
± ±
wherez representstheδ100%percentileopfthestandardnpormaldistribution.
δ
5 SimulationStudies
Inthissection,MonteCarlosimulationstudiesarecarriedouttoassesstheperformanceoftheMLEs
ofαandγdescribedintheprevioussection.Twosetsofparametersareconsidered,eachonecorre-
spondingtoonestudy.ThetrueparametersforthesestudiesarepresentedinTab.5.
Study α k γ
I 1.0 2 1.0
II 2.0 4 0.5
Table5:Parametersforsimulationstudies
Eachstudytakesintoaccountthefollowingscenariosbyvaryingthesamplesizen:1000and10000.
Then, following the procedure to generate random numbers indicated in Subection 3.3, random
numbers aresimulated froma GEL-S distribution with givenparameters α, k and γ. A fixedseed
isusedtogeneratesuchrandomnumbers, impliyingthatallresultsofthesestudiescanalwaysbe
exactlyreplicated.
Fig. 3exhibitshistogramsoftheempiricalpdfsofthesamplesanalyzed. Theseplotsarebuiltusing
100 bins in order to have enough detail on the shape of these empirical curves. The plots on top
correspondtothestudyIandtheonesonbottomtothestudyII.Fromtheseplots, agreaterright
skewnessfordataofthestudyIthantheonefordataofthestudyIIisobserved, independently of
variationsofn.
9
250 2500
200 2000
Frequency 150 Frequency 1500
100 1000
50 500
0 0
0 200 400 600 800 0 200 400 600 800
x x
80 800
60 600
Frequency 40 Frequency 400
20 200
0 0
5 10 15 20 5 10 15 20
x x
Figure3:HistogramsbyvaryingtheparametersoftheGEL-Sdistribution(studyIontopandstudyII
onbottom)andbyvaryingn(n 1000totheleftandn 10000totheright)
= =
Next,estimatesofαandγarecomputedgivenk,usingtheprocedureproposedinSection4fores-
timatingαandγgivenk. Consideringalwaysrangesofk from0to6,Tab. 6showstheseresultsby
varyingthetrueparametersandn.Foreachk,theobservedmaximumlikelihoodisincluded.Then,
bystudyandn,themodelswiththehighestlikelihoodoverthestudiedrangeofkareselected.These
selectedmodelsarehighlighted. Itisfoundthatthevaluesk oftheselectedmodelscorrespondto
thetruevaluesk,exceptwhenn 1000instudyII.Hence,itseemsthat,undertheestimatemethod
=
proposed, for small samples with notso highskewness, other thanthe trueparameter k could be
possible.Ontheestimatesofγgivenbytheselectedmodels,theyarethenearesttothetrueparame-
ters,exceptwhenn 1000instudyII.Consideringαˆ oftheselectedmodels,theyarenotalwaysthe
=
nearesttothetrueparameters.
FortheestimatesofαandγindicatedintheselectedmodelsinTab. 6,Tab. 7reportstheir95%CIs
computedusingstandarderrorsoftheMLEsofαandγcomputedfromtheobservedHessianmatrix
provided by the function nlm. These results show that the errors of these estimates, as expected,
decreasewhennincreases,anditseemsthattheerrorsofγˆaresystematicallylowerthantheonesof
αˆ.
6 Applications
Inthissection,wepresentapplicationsinordertoillustratetheperformanceandusefulnessofthe
proposeddistributionwhencomparedtonaturalcompetitors.
Popular right-skewed real data from several domains are used. In all cases, these data have been
10