Table Of ContentSPRINGER BRIEFS IN STATISTICS ABE
Víctor Hugo Lachos Dávila
Celso Rômulo Barbosa Cabral
Camila Borelli Zeller
Finite Mixture
of Skewed
Distributions
SpringerBriefs in Statistics
SpringerBriefs in Statistics – ABE
SeriesEditors
FranciscoLouzadaNeto,DepartamentodeMatemáticaAplicadaeEstatística,
UniversidadedeSãoPaulo,SãoCarlos,SãoPaulo,Brazil
HeliodosSantosMigon,InstitutodeMatemática,UniversidadeFederaldoRio
deJaneiro,RiodeJaneiro,RiodeJaneiro,Brazil
GilbertoAlvarengaPaula,InstitutodeMatemáticaeEstatística,Universidade
deSãoPaulo,SãoPaulo,SãoPaulo,Brazil
SpringerBriefs present concise summaries of cutting-edge research and practical
applicationsacrossawidespectrumoffields.Featuringcompactvolumesof50to
125 pages, the series covers a range of content from professional to academic.
Briefsarecharacterizedbyfast,globalelectronicdissemination,standardpublishing
contracts, standardized manuscript preparation and formatting guidelines, and
expeditedproductionschedules.
Typicaltopicsmightinclude:
(cid:129) Atimelyreportofstate-of-arttechniques
(cid:129) A bridge between new research results, as published in journal articles, and
acontextualliteraturereview
(cid:129) Asnapshotofahotoremergingtopic
(cid:129) Anin-depthcasestudy
(cid:129) A presentationofcoreconceptsthatstudentsmustunderstandinordertomake
independentcontributions
This SpringerBriefsin Statistics – ABE subseries aims to publish relevantcontri-
butions to several fields in the Statistical Sciences, produced by full members of
the ABE – Associação Brasileira de Estatística (Brazilian Statistical Association),
or by distinguished researchers from all Latin America. These texts are targeted
toabroadaudiencethatincludesresearchers,graduatestudentsandprofessionals,
and may cover a multitude of areas like Pure and Applied Statistics, Actuarial
Science, Econometrics, Quality Assurance and Control, Computational Statistics,
RiskandProbability,EducationalStatistics,DataMining,BigDataandConfidence
andSurvivalAnalysis,tonameafew.
Moreinformationaboutthissubseriesathttp://www.springer.com/subseries/13778
Víctor Hugo Lachos Dávila
Celso Rômulo Barbosa Cabral
Camila Borelli Zeller
Finite Mixture of Skewed
Distributions
123
VíctorHugoLachosDávila CelsoRômuloBarbosaCabral
DepartmentofStatistics DepartmentofStatistics
UniversityofConnecticut FederalUniversityofAmazonas
StorrsMansfield,CT,USA Manaus,Brazil
CamilaBorelliZeller
DepartmentofStatistics
FederalUniversityofJuizdeFora
JuizdeFora,MinasGerais,Brazil
ISSN2191-544X ISSN2191-5458 (electronic)
SpringerBriefsinStatistics
ISSN2524-6917
SpringerBriefsinStatistics–ABE
ISBN978-3-319-98028-7 ISBN978-3-319-98029-4 (eBook)
https://doi.org/10.1007/978-3-319-98029-4
LibraryofCongressControlNumber:2018951927
MathematicsSubjectClassification:62-07,62Exx,62Fxx,62Jxx
©TheAuthor(s),underexclusivelicencetoSpringerNatureSwitzerlandAG2018
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof
thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,
broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation
storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology
nowknownorhereafterdeveloped.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication
doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant
protectivelawsandregulationsandthereforefreeforgeneraluse.
Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook
arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor
theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany
errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional
claimsinpublishedmapsandinstitutionalaffiliations.
ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG
Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland
Lourdes,CarloseMarcelo(CamilaBorelli
Zeller)
Sílvia,IvaneteeGabriela(CelsoRômulo
BarbosaCabral)
RosaDávila,Eduardo,ThuanyeAlberto
(VíctorHugoLachosDávila)
Preface
Modeling based on finite mixture distributions is a rapidly developing area with
an exploding range of applications. Finite mixture models are nowadays applied
in such diverse areas as biology, biometrics, genetics, medicine, and marketing,
among others. The main objective of this book is to present recent results in this
areaofresearch,whichintendedtopreparethereaderstoundertakemixturemodels
usingscalemixturesofskew-normal(SMSN)distributions.Weconsidermaximum
likelihoodestimationforunivariateandmultivariatefinitemixtureswherecompo-
nentsaremembersoftheflexibleclassofSMSNdistributionsproposedbyLachos
et al. (2010), which is a subclass of the skew-elliptical class proposed by Branco
and Dey (2001). This subclass contains the entire family of normal independent
distributions, also known as scale mixtures of normal distributions (SMN) (Lange
and Sinsheimer 1993). In addition, the skew-normal and skewed versions of some
other classical symmetric distributions are SMSN members: the skew-t (ST), the
skew-slash (SSL), and the skew-contaminated normal (SCN), for example. These
distributions have heavier tails than the typical normal one, and thus they seem to
beareasonable choiceforrobustinference.Theproposed EM-typealgorithmand
methodsareimplementedintheRpackagemixsmsn.
In Chap.1, we present a motivating example, where it seems that a better
degreeofexplanationisobtainedusingmixturesofSMSNdistributionsthanusing
symmetriccomponents.
In Chap.2, we present background material on mixture models and the EM
algorithm for maximum likelihood estimation. This is followed by the derivation
oftheobservedinformationmatrixtoobtainthestandarderrors.
InChap.3,forthesakeofcompleteness,wedefinethemultivariateSMSNdis-
tributionsandstudysomeofitsimportantproperties,viz.,moments,kurtosis,linear
transformations,marginalandconditionaldistributions,amongothers.Further,the
EMalgorithmforperformingmaximumlikelihoodestimationispresented.
In Chap.4, we propose a finite mixture of univariate SMSN distributions
(FM-SMSN) and an EM-type algorithm for maximum likelihood estimation. The
associated observed information matrix is obtained analytically. The methodology
vii
viii Preface
proposed is illustrated considering the analysis of a real data set and simulation
studies.
In Chap.5, we consider a flexible class of models, with elements that are
finite mixtures of multivariate SMSN distributions. A general EM-type algorithm
is employed for iteratively computing parameter estimates and this is discussed
with emphasis on finite mixtures of skew-normal, skew-t, skew-slash, and skew-
contaminatednormaldistributions.Further,ageneralinformation-basedmethodfor
approximatingtheasymptoticcovariancematrixoftheestimatesisalsopresented.
Finally in Chap.6, we present a proposal to deal with mixtures of regression
models by assuming that the random errors follow scale mixtures of skew-normal
distributions. This approach allows us to model data with great flexibility, accom-
modatingskewnessandheavytailsatthesametime.AsimpleEM-typealgorithm
toperformmaximumlikelihoodinferenceoftheparametersoftheproposedmodel
is derived. A real data set is analyzed, illustrating the usefulness of the proposed
method.
We hope that the publication of this text will enhance the spread of ideas
that are currently trickling thought the literature of mixture models. The skew
modelsandmethodsdevelopedrecentlyinthisfieldhaveyettoreachtheirlargest
possible audience, partly because the results are scattered in various journals.
[email protected],[email protected],
or [email protected]. Víctor H. Lachos was supported by CNPq-Brazil
(BPPesq) and São Paulo State Research Foundation (FAPESP). Celso Rômulo
Barbosa Cabral was supported by CNPq (BPPesq) and Amazonas State Research
Foundation(FAPEAM,Universalproject).CamilaBorelliZellerwassupportedby
CNPq(BPPesqandUniversalproject)andMinasGeraisStateResearchFoundation
(FAPEMIG,Universalproject).
Storrs,USA VíctorHugoLachosDávila
Manaus,Brazil CelsoRômuloBarbosaCabral
JuizdeFora,Brazil CamilaBorelliZeller
January2018
Contents
1 Motivation .................................................................... 1
2 MaximumLikelihoodEstimationinNormalMixtures.................. 7
2.1 EMAlgorithmforFiniteMixtures..................................... 8
2.2 StandardErrors.......................................................... 10
3 ScaleMixturesofSkew-NormalDistributions............................ 15
3.1 Introduction ............................................................. 15
3.2 SMNDistributions...................................................... 17
3.2.1 ExamplesofSMNDistributions............................... 18
3.3 MultivariateSMSNDistributionsandMainResults.................. 19
3.3.1 ExamplesofSMSNDistributions.............................. 25
3.3.2 ASimulationStudy............................................. 29
3.4 MaximumLikelihoodEstimation...................................... 31
3.5 TheObservedInformationMatrix ..................................... 34
4 UnivariateMixtureModelingUsingSMSNDistributions............... 37
4.1 Introduction ............................................................. 37
4.2 TheProposedModel.................................................... 38
4.2.1 MaximumLikelihoodEstimationviaEMAlgorithm......... 39
4.2.2 NotesonImplementation....................................... 42
4.3 TheObservedInformationMatrix ..................................... 43
4.3.1 TheSkew-tDistribution........................................ 45
4.3.2 TheSkew-SlashDistribution................................... 45
4.3.3 TheSkew-ContaminatedNormalDistribution ............... 45
4.4 SimulationStudies...................................................... 46
4.4.1 Study1:Clustering ............................................. 46
4.4.2 Study2:AsymptoticProperties................................ 48
4.4.3 Study3:ModelSelection....................................... 49
4.5 ApplicationwithRealData............................................. 54
ix
x Contents
5 MultivariateMixtureModelingUsingSMSNDistributions ............ 57
5.1 Introduction ............................................................. 57
5.2 TheProposedModel.................................................... 59
5.2.1 MaximumLikelihoodEstimationviaEMAlgorithm......... 60
5.3 TheObservedInformationMatrix ..................................... 63
5.3.1 TheSkew-NormalDistribution ................................ 64
5.3.2 TheSkew-tDistribution........................................ 65
5.3.3 TheSkew-SlashDistribution................................... 65
5.3.4 TheSkew-ContaminatedNormalDistribution................ 65
5.4 ApplicationswithSimulatedandRealData........................... 66
5.4.1 Consistency ..................................................... 66
5.4.2 StandardDeviation ............................................. 67
5.4.3 ModelFitandClustering....................................... 71
5.4.4 ThePimaIndiansDiabetesData............................... 74
5.5 IdentifiabilityandUnboundedness..................................... 76
6 MixtureRegressionModelingBasedonSMSNDistributions .......... 77
6.1 Introduction ............................................................. 77
6.2 TheProposedModel.................................................... 79
6.2.1 MaximumLikelihoodEstimationviaEMAlgorithm......... 80
6.2.2 NotesonImplementation....................................... 83
6.3 SimulationExperiments ................................................ 84
6.3.1 Experiment1:ParameterRecovery............................ 84
6.3.2 Experiment2:Classification ................................... 85
6.3.3 Experiment3:Classification ................................... 89
6.4 RealDataset............................................................. 90
References......................................................................... 95
Index............................................................................... 101