Table Of Content

Truncation-free Hybrid Inference for DPMM ArnimBleier DepartmentofComputationalSocialScience LeibnizInstitutefortheSocialSciences Cologne,50667-Germany [email protected] 7 1 0 2 Abstract n a Dirichlet process mixture models (DPMM) are a cornerstone of Bayesian non- J parametrics. While these models free from choosing the number of components 3 a-priori, computationally attractive variational inference often reintroduces the 1 need to do so, via a truncation on the variational distribution. In this paper we presentatruncation-freehybridinferenceforDPMM,combiningtheadvantages ] ofsampling-basedMCMCandvariationalmethods. Theproposedhybridization G enables more efficient variational updates, while increasing model complexity L only if needed. We evaluate the properties of the hybrid updates and their em- . piricalperformanceinsingle-aswellasmixed-membershipmodels. Ourmethod s c iseasytoimplementandperformsfavorablycomparedtoexistingschemas. [ 1 1 Background v 3 Tobeginwith,consideramodelfordata = x ,x ,...,x thatisassumedtobegeneratedby 4 1 2 N X { } a mixture of simpler component models F(φ ). Following the single-membership assumption of 7 k 3 eachdatapointxi ∈X beingexplainedbyasinglecomponentφzi wehave 0 θ Dir(α) . ∼ K 1 φ H(β) k [1,K] k 0 ∼ ∀ ∈ z Cat(θ) i [1,N] 7 i ∼ ∀ ∈ 1 x F(φ ) i [1,N], (1) : i ∼ zi ∀ ∈ v andarriveattheinfinite-dimensionalDPMMforK . WhilecollapsedGibbssampling(CGS) i is commonly used to explore the unbounded latent s→pa∞ce of this model, consider as an alternative X thecollapsedvariationaldistribution r a N (cid:89) q(z)= Cat(z γ ,...,γ ), (2) i i1 iK+1 | i=1 similartopartiallycollapsedapproximations[4],howeverwithθandφintegratedout. Theupdates to optimize the variational parameter γ with regards to the true distribution p are then for each i observationi 1,...,N ∈{ } γ exp(E[logp(z =k z )]+E[logp(x x¬i ,β)]). (3) ik ∝ i | ¬i i | z=k While exact computations are hard, second-order Tailor expansions [4] or the use of zero-order informationhavebeensuggestedforestimation[1]. Usingonlyzero-orderinformationtheoptimal settingsofγ are  n¬i n¬ik+α p(xi |x¬z=ik,β) ifk ≤K γ . (4) ik ∝n¬iα+α p(xi |β) ifk =K+1, . unusable,, K+1#dimensional## ' update# / 1$ 2$ …$ K$ K+1$ ⇠ sample#if#a#new## c# ⇠ dimension#is#needed# 1$ 2$ if#c#=#1# hybrid## update#'HYB/ 1$ 2⇣$ …$ K$ if#c#=#2# K+1$ trunca'on)free, Figure1: Illustrationofhybridupdates whereweoverloadthenotationn¬i = (cid:80)N¬iγ , ascomparedtothestandardCGS,withtheex- k j=1 jk pectednumberofdatapointsexplainedbythekthcomponent. Besidesthesimplifyingassumptions made,theseupdateswouldplaceprobabilitymassoneachcomponentγ =0 k 1,...,K+1 ik (cid:54) ∀ ∈{ } andintroduceanewcomponentineachstepoftheinference. Acommonwaytoaddressthisprob- lemistheuseoffixedfinite-dimensionalvariationalapproximationsforDPMM[4]. Alternatively, Lin[5]aswellasWangetal.[7],amongstothers,discussmethodsforgrowingthetruncationaspart of the inference. Lin [5] introduces an additional parameter controlling the growth of the truncation. Wangetal. [7]estimateparametersforlocallycollapsedvariationalinferencefromtraditional samples,losingvaluableinformationintheupdates. We extend the idea of estimating variational parameters from samples with a method to construct the samples more efficiently. In the remainder of this paper, we start with the construction of the proposed truncation-free hybrid updates. After that we study the properties of these updates. We concludewithashortevaluationaswellasadiscussionofthecurrentlimitationsanddirectionsfor futurework. 2 ConstructionofHybridUpdates Our goal is to allow for a truncation-free variational posterior inference that keeps as much infor- mationaspossibleoftheupdates,whilestillbeingabletoexploretheunboundedlatentspaceina fashionsimilartotheGibbssampler. Toreachthatgoal,wesuggesttoreplacetheunusableK +1 dimensionalupdateofthevariationalparameterγ withahybridupdate. Thesuggestedupdatehas i eithertheformofaK dimensionaltruncatedvariationalparameteroraK+1dimensionalparam- eter instantiating a new component. Our hybridization depends only on local information and we will use the abbreviation ϕ = γ to refer, in this section, to the kth component in the K +1 k ik dimensionalprobabilityvectorinEquation4. Letξbethetwo-dimensionalparameterofacategoricaldistributionwiththefirstdimension K (cid:88) ξ ϕ (5) 1 k ∝ k=1 beingproportionaltothesumoftheexplanatorypowerofthefirstK components,andthesecond dimension ξ ϕ (6) 2 K+1 ∝ beingproportionaltowhatisexplainedbytheyetuninstantiatedcomponents. Furthermore,letζ , c withc 1,2 ,beprobabilityvectorsrepresenting,respectively,atruncatedvariationaldistribution ∈{ } andaGibbssampleinstantiatinganewdimensioninvectornotation   ϕ ifk K 0 ifk K  k  ≤ ≤ ζ , ζ (7) 1k 2k ∝0 ifk =K+1 ∝1 ifk =K+1. 2 4600 4400 y xit4200 e pl4000 r pe3800 HCVB0 3600 CGS 3400 TSBVB 0.02 0.1 0.5 1 2 iteration Figure2: Predictiveperformanceinsinge-membershipmodelsfortheAssociatedPressdataset. Withthissetup,wethensampleavariable c Cat(ξ) (8) ∼ fromξ toindicatewhetherthetruncatedvariationalupdateortheprobabilityvectorinstantiatinga newcomponentisselected ϕHYB ζ . (9) k ∝ ck TheprobabilityvectorϕHYBisourhybridupdate.Thehybridupdatereplacestheoriginalunusable K+1dimensionalvariationalupdateγ ϕHYB,usingmostofthetimetheefficienttruncatedK i ← dimensionalvariationaldistributionwithoutintroducinganewdimensionintheupdatestep,while introducinganewK +1th componentonlyifneeded,similartoaGibbssampler. Foragraphical illustrationoftheconstructionofϕHYB,seeFigure1. 3 PropertiesofHybridUpdates This construction of the hybrid updates has a number of favorable properties. By definition of the parameter ξ (Equations 5, 6), the event of sampling the second category c = 2 has the same expectationandvarianceassamplinganewcomponentintheGibbssampler: E[1[c=2]]=E[1[z =K+1]] i Var[1[c=2]]=Var[1[z =K+1]]. (10) i TogetherwithEquation7and9,weseethatthiscarriesovertothehybridupdateitself E[1[ϕHYB =1]]=E[1[z =K+1]] K+1 i Var[1[ϕHYB =1]]=Var[1[z =K+1]], (11) K+1 i making it possible to introduce new components like in a Gibbs sampler. Even more, the preser- vationofexpectationisnotlimitedtotheK +1th dimensionitself,butourhybridupdatesϕHYB preservetheexpectation,withregardstoϕ ,overallK+1dimensions k E[ϕHYB]=E[ϕ ] k [1,..,K+1]. (12) k k ∀ ∈ Notethatthisis,withtheexceptionoflocallycollapsedvariationalinference[7],generallynotthe caseforvariationalupdatesinnon-parametricmodels. Moreover,thesumoftheexplanatorypoweroftheexistingK dimensionswillexceedtheexplana- torypowerofintroducinganewdimensionE[ξ ]>E[ξ ]formostdatapoints, supportingtheuse 1 2 ofthemoreinformativevariationaldistributionζ inmostoftheupdates. Finally,thecomputations 1 necessary for the hybrid update ϕHYB are easy to implement and readily available by almost no additionalcomputationalcostsfromthenormalizationtermsofϕ. 3 35000 35000 HCSVB0 PCSVB0 30000 30000 SCTFVB SCVB0 K=40 25000 25000 exity20000 exity20000 SSCCVVBB00 KK==130000 pl pl er er p15000 p15000 10000 10000 5000 5000 103 104 105 106 107 101 102 103 104 105 documents seen seconds Figure3: PredictiveperformanceinHDP-LDAmodelsfortheNewYorkTimesdataset. 4 ExperimentsandDiscussion Thissectionconcludesourpaperwithanearlyempiricalevaluationoftheproposedhybridupdates. Fortheevaluationweusedtwotextdatasets: (1)TheAssociatedPresscorpusconsistingof2,250 documents, where we used a vocabulary of 10,932 distinct terms occurring over a total of 398k tokens. (2) The larger New York Times corpus consisting of 1,8 million articles, from which we extracted153milliontokensusingavocabularyof77,928distinctterms. TheAssociatedPresscorpuswasusedforevaluatingtheproposedupdatesinthesingle-membership model together with a Dirichlet-Multinomial data model for the documents. In the experiments, we held out 20% of the documents as a test set test and batch-trained on the remaining train documents. Next,wespliteachtestdocumentx X testintwopartsx =(xa,xb),xaconXsisting i ∈X i i i i of 70% of the document for estimating the indicator variable and computed the perplexity of the remaining 30% xb. We then compared the perplexity versus the number of iterations. Figure 2 i displaystheperformanceforhybridupdatesinthezero-ordercollapsedvariationalsetting(HCVB0), CGSandtruncated(T=40)stick-breakingmeanfieldvariationalBayes(TSBVB)[4]. TheNewYorkTimescorpuswasusedforevaluatingthehybridupdatesinmixed-membershipHDP- LDA models, with test-train splits similar to above. However, for this larger dataset we resorted tocollapsedstochasticinferenceusingminibatchesof60documents. Wecomparedourmethodto thetruncation-freelocallycollapsedvariationalinference(SCTFVB)[7]andthefinitedimensional stochastic collapsed variational Bayesian inference for LDA (SCVB0) [3] using 40, 100 and 300 topics. We employed our hybrid updates in a setting similar to SCTFVB, however using a lower- bound approximation for the estimation of the stick-breaking weights [6, 2]. We used the same parameterizationsfortheupdateschedulesinallinferenceschemas. Figure3displaystheresults,as afunctionofthenumberofdocumentsprocessed(left)andwall-clocktimeinseconds(right), for thesameruns.Inthefigure,HCSVB0denotestheresultswithourhybridupdates.PCSVB0denotes afinitedimensionalvariationalapproximationotherwiseidenticaltoHCSVB0,buttruncatedat300 topics. Inthiswork,wedesiredtousetheadvantagesofMCMCschemasandvariationalschemascombined inasingleinferenceschemaforBayesiannon-parametricmodels. Weansweredtothisdemandby presenting a novel type of hybridization that efficiently uses the full variational distribution while samplingfortheintroductionofnewcomponents. Theproposedmethodiseasytoimplementand measurably improves the predictive performance over state of the art methods for single- as well asmixed-membershipmodelsatlittleadditionalcomputationalcost. Thecurrentlimitationsofthe presented work are two-fold. While we have established some favorable properties of the updates andfoundpredictiveperformanceimprovements,werelyonapproximationsandhaveonlylimited theoreticalargumentslegitimizingourapproach. Theotherlimitationofthisworkisitsscope. Next to a more thorough experimental evaluation and further formalization, an adaption of the hybrid updates to Wang et al. [8]’s Chinese restaurant process based variational inference for the HDP couldpotentiallybeapromisingdirectionforfuturework. 4 References [1] Arthur Asuncion, Max Welling, Padhraic Smyth, and Yee Whye Teh. On smoothing and inference for topic models. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence,2009. [2] ArnimBleier.Practicalcollapsedstochasticvariationalinferenceforthehdp.InNIPSWorkshop onTopicModels: Computation,Application,andEvaluation,2013. [3] JamesFoulds,LeviBoyles,ChristopherDuBois,PadhraicSmyth,andMaxWelling. Stochastic collapsed variational bayesian inference for latent dirichlet allocation. In Proceedings of the 19thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,2013. [4] Kenichi Kurihara, Max Welling, and Yee Whye Teh. Collapsed variational dirichlet process mixturemodels. InIJCAI,volume7,2007. [5] DahuaLin. Onlinelearningofnonparametricmixturemodelsviasequentialvariationalapprox- imation. InAdvancesinNeuralInformationProcessingSystems,2013. [6] IsseiSato,KenichiKurihara,andHiroshiNakagawa.Practicalcollapsedvariationalbayesinfer- enceforhierarchicaldirichletprocess. InProceedingsofthe18th ACMSIGKDDinternational conferenceonKnowledgediscoveryanddatamining,2012. [7] ChongWangandDavidMBlei. Truncation-freeonlinevariationalinferenceforbayesiannon- parametricmodels. InAdvancesinneuralinformationprocessingsystems,2012. [8] Chong Wang, John William Paisley, and David M Blei. Online variational inference for the hierarchicaldirichletprocess. InAISTATS,2011. 5

Truncation-free Hybrid Inference for DPMM PDF

0.38 MB·

by Arnim Bleier

#journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Download Truncation-free Hybrid Inference for DPMM PDF Free - Full Version

by Arnim Bleier| 0.38

Download Truncation-free Hybrid Inference for DPMM by Arnim Bleier in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Truncation-free Hybrid Inference for DPMM

No description available for this book.

Detailed Information

Author:	Arnim Bleier
File Size:	0.38
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Truncation-free Hybrid Inference for DPMM Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Truncation-free Hybrid Inference for DPMM PDF?

Yes, on https://PDFdrive.to you can download Truncation-free Hybrid Inference for DPMM by Arnim Bleier completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Truncation-free Hybrid Inference for DPMM on my mobile device?

After downloading Truncation-free Hybrid Inference for DPMM PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Truncation-free Hybrid Inference for DPMM?

Yes, this is the complete PDF version of Truncation-free Hybrid Inference for DPMM by Arnim Bleier. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Truncation-free Hybrid Inference for DPMM PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.