Table Of Content

PHYSTAT2003, SLAC, Stanford, California, September 8-11,2003 1 A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism? RajendranRaja Fermilab,Batavia,IL60510,USA 4 Maximumlikelihoodfitstodatacanbedoneusingbinneddata(histograms)andunbinneddata. Withbinned data, one gets not only the fitted parameters but also a measure of the goodness of fit. With unbinned data, 0 currently, the fitted parameters are obtained but no measure of goodness of fit is available. This remains, to 0 date, an unsolved problem in statistics. Using Bayes’ theorem and likelihood ratios, we provide a method by 2 whichboth thefitted quantities andameasureofthegoodness offitareobtained forunbinned likelihoodfits, n aswellaserrorsinthefittedquantities. Thequantity,conventionallyinterpretedasaBayesianprior,isseenin a thisschemetobeanumbernotadistribution,thatisdeterminedfromdata. J 9 2 1. Introduction 2. Likelihood ratios ] n As of the Durham conference [1], the problem of In binned likelihood cases, where one is comparing a obtaining a goodness of fit in unbinned likelihood fits a theoretical distribution P(cs) with a binned his- a- was an unsolved one. In what follows, we will de- togram,therearetwodistribut|ionsinvolved,thetheo- t notebythevectors,the theoreticalparameters(s for reticaldistributionandthedatadistribution. Thepdf a d “signal”) and the vector c, the experimentally mea- ofthedataisapproximatedbythebincontentsofthe . sured quantities or “configurations”. For simplicity, histogramnormalizedto unity. If the data consists of cs we will illustrate the method where both s and c are nevents,thepdf ofthedataPdata(c)isdefinedinthe i one dimensional, though either or both can be multi- frequentist sense as the normalized density distribu- s y dimensional in practice. We thus define the theo- tion in c space of n events as n . In the binned → ∞ h retical model by the conditional probability density case,we canbin in finer andfiner bins as n and →∞ p P(cs). Then an unbinned maximum likelihood fit to obtain a smooth function, which we define as the pdf [ data| is obtained by maximizing the likelihood [2], of the data Pdata(c). In practice, one is always lim- ited by statistics and the binned function will be an 2 v i=n approximation to the true pdf. We can now define a 3 L= P(ci|s) (1) likelihood ratio LR such that 3 Y i=1 11 where the likelihood is evaluated at the n observed LR = Qi=ii==nn1PPda(ctai|(sc)) ≡ PPda(ctan(|csn)) (2) 0 data points c ,i = 1,n. Such a fit will determine i=1 i 04 the maximumi likelihood value s∗ of the theoretical where we haveQused the notation cn to denote the / parameters, but will not tell us how good the fit is. event set ci,i = 1,n. Let us now note that R is s The value of the likelihood at the maximum like- invariant under the variable transformation c L c′, ic lihood point does not furnishLa goodness of fit, since since → s the likelihood is not invariant under change of vari- y h able. This can be seen by observing that one can P(c′ s)= dc P(cs) (3) p transform the variable set c to a variable set c′ such | |dc′| | v: that P(c′|s∗) is uniformly distributed between 0 and Pdata(c′)= dc Pdata(c) (4) 1. Such a transformation is known as a hypercube |dc′| i X transformation, in multi-dimensions. Other datasets ′ = (5) r will yield different values of likelihood in the variable LR LR a spacecwhenthelikelihoodiscomputedwiththeorig- and the Jacobian of the transformation dc cancels inal function P(cs∗). However,in the originalhyper- in the numerator and denominator in the|drca′t|io. This | cubespace,thevalueofthelikelihoodisunityregard- is an extremely important property of the likelihood less of the dataset c′,i = 1,n, thus the likelihood ratio that qualifies it to be a goodness of fit vari- cannotfurnishagoodinessoffitbyitself,sinceneitheLr able. LSiRncethedenominatorPdata(cn)isindependent the likelihood, nor ratios of likelihoods computed us- ofthetheoreticalparameterss,boththelikelihoodra- ing the same distribution P(cs∗) is invariant under tio and the likelihoodmaximize at the same point s∗. | variabletransformations. Thefundamentalreasonfor One can also show [3] that the maximum value of the this non-invariance is that only a single distribution, likelihoodratiooccurswhenthetheoreticallikelihood namely,P(cs∗)isbeingusedtocomputethegoodness P(c s)andthedatalikelihoodPdata(c )areequalfor i i | | of fit. all c . i MOCT003 2 PHYSTAT2003, SLAC, Stanford, California, September 8-11,2003 3. Binned Goodness of Fit 5. An illustrative example In the case where the pdf Pdata(c) is estimated by We consider a simple one-dimensional case where binned histograms and the statistics are Gaussian, it the data is an exponential distribution, say decay isreadilyshown[3]thatthe commonlyusedgoodness times of a radioactive isotope. The theoretical pre- offitvariableχ2 = 2log R. Itisworthemphasizing diction is given by − L that the likelihood ratio as defined above is needed and not just the negative log of theoretical likelihood 1 c P(cs)= exp( ) (11) P(cn s) to derive this result. The popular conception | s −s that χ| 2 is -2 log P(cn s) is simply incorrect!. It can | We have chosen an exponential with s = 1.0 for this also be shown that the likelihood ratio defined above example. The Gaussian Kernel for the PDE would can describe the binned cases where the statistics are be given by Poissonian[4]. Inorderto solve ourproblemofgood- ness of fit in unbinned likelihood cases, one needs to 1 c2 arriveatamethodofestimatingthedatapdf Pdata(c) (c)= exp( ) (12) G (√2πσh) −2σ2h2 without the use of bins. wherethevarianceσ oftheexponentialisnumerically equal to s. To begin with, we chose a constant value 4. Unbinned Goodness of Fit for the smoothing parameter, which for 1000 events generated is calculated to be 0.125. Figure 1 shows Oneofthe better knownmethods ofestimating the thegeneratedevents,thetheoreticalcurveP(cs)and probability density of a distribution in an unbinned | the PDE curve P(c) normalized to the number of case is by the use of Probability Density Estimators events. The PDE fails to reproduce the data near (PDE′s),alsoknownasKernelDensityEstimators[5] the origin due to the boundary effect, whereby the (KDE′s). The pdf Pdata(c) is approximated by Gaussian probabilities for events close to the origin i=n spillovertonegativevaluesofc. Thislostprobability 1 Pdata(c) PDE(c)= (c c ) (6) would be compensated by events on the exponential i ≈ n G − Xi=1 distribution with negative c if they existed. In our case, this presents a drawback for the PDE method, where a Kernel function (c c ) is centered around i G − which we will remedy later in the paper using PDE each data point c , is so defined that it normalizes to i definitions on the hypercube and periodic boundary unity and for large n approaches a Dirac delta func- conditions. For the time being, we will confine our tion [3]. The choice of the Kernel function can vary example to values of c > 1.0 to avoid the boundary depending on the problem. A popular kernel is the effect. Gaussian defined in the multi-dimensional case as In order to test the goodness of fit capabilities of 1 Hαβcαcβ (c)= exp(− ) (7) the likelihood ratio R, we superimpose a Gaussian G (√2πh)d (det(E)) 2h2 on the exponential Land try and fit the data by a where E is the erropr matrix of the data defined as simple exponential. Figure 2 shows the “data” with 1000eventsgeneratedasanexponentialinthefiducial Eα,β =<cαcβ > <cα ><cβ > (8) range 1.0 < c < 5.0. Superimposed on it is a Gaus- − and the <> implies average over the n events, and sian of 500 events. More events in the exponential d is the number of dimensions. The Hessian matrix are generated in the interval 0.0 < c < 1.0 to avoid H is defined as the inverse of E and the repeated theboundaryeffectatthefiducialboundaryatc=1.0. indices imply summing over. The parameter h is a Since the number density variessignificantly, we have “smoothing parameter”,which has[6] a suggested op- had to introduce a method of iteratively determining timal value h n−1/(d+4), that satisfies the asymp- the smoothing factor as a function of c as described ∝ totic condition in [3]. With this modification in the PDE, one gets a good description of the behavior of the data by the (c c ) lim (c c )=δ(c c ) (9) G∞ − i ≡n→∞G − i − i PDE as shown in Figure 2. We now vary the number of events in the Gaussian and obtain the value of Theparameterhwilldependonthelocalnumberden- the negative log likelihood ratio as a function sity and will have to be adjusted as a function of the NLLR of the strength of the Gaussian. Table I summarizes localdensitytoobtaingoodrepresentationofthedata the results. The number of standard deviations the by the PDE. Our proposal for the goodness of fit in unbinnedlikelihoodfit is fromwhatis expectedis de- unbinned likelihood fits is thus the likelihood ratio termined empirically by plotting the value of P(cn s) P(cn s) NLLR = | | (10) fora largenumber offits whereno Gaussianis super- LR Pdata(cn) ≈ PPDE(cn) imposed(i.e. thenullhypothesis)anddeterminingthe evaluated at the maximum likelihood point s∗. mean and RMS of this distribution and using these MOCT003 PHYSTAT2003, SLAC, Stanford, California, September 8-11,2003 3 Table I Goodness of fit results from unbinnedlikelihood and binned likelihood fits for various data samples. The negative valuesfor thenumberof standard deviations in some of the examples is dueto statistical fluctuation. Numberof Unbinnedfit Unbinnedfit Binned fit χ2 Gaussian events NLLR Nσ 39 d.o.f. 500 189. 103 304 250 58.6 31 125 100 11.6 4.9 48 85 8.2 3.0 42 75 6.3 1.9 38 50 2.55 -0.14 30 0 0.44 -1.33 24 Figure 1: Figure shows the histogram (with errors) of It can be seen that -log P(cn|s) and -log PPDE(cn) are correlated with each other and the difference be- generated events. Superimposed is thetheoretical curve P(c|s) and thePDE estimator (solid) histogram with no tween the two (-log ) is a much narrower dis- NLLR errors. tribution than either and provides the goodness of fit discrimination. Figure 2: Figure shows the histogram (with errors) of Figure 3: (a) shows the distribution of thenegative 1000 eventsin thefiducial interval1.0<c<5.0 generated as an exponential with decay constant s=1.0 log-likelihood -loge(P(cn|s)) for an ensemble of experimentswhere data and experiment are expected to with a superimposed Gaussian of 500 eventscentered at fit. (b) Showsthe negative log PDE likelihood c=2.0 and width=0.2. The PDE estimator is the(solid) histogram with no errors. -loge(P(cn)) for thesame data (c) Shows thecorrelation between thetwo and (d) Shows thenegative log-likelihood ratio NLLRthat is obtained by subtracting(b) from (a) on an event byevent basis. to estimate the number of σ’s the observed is NLLR from the null case. Table I also gives the results of a binned fit on the same “data”. It can be seen that the unbinned fit gives a 3σ discrimination when the 5.1. Improving the PDE number of Gaussianevents is 85,where as the binned fit gives a χ2/ndf of 42/39 for the same case. We in- ThePDEtechniquewehaveusedsofarsuffersfrom tend to make these tests more sophisticated in future two drawbacks; firstly, the smoothing parameter has work. to be iteratively adjusted significantly over the full Figure3showsthevariationof-logP(cn s)and-log range of the variable c, since the distribution P(cs) PPDE(cn) for an ensemble of 500 experim| ents each changes significantly over that range; and second|ly, with the number of events n = 1000 in the exponen- there are boundary effects at c=0 as shown in fig- tial and no events in the Gaussian (null hypothesis). ure 1. Both these flaws are remedied if we define the MOCT003 4 PHYSTAT2003, SLAC, Stanford, California, September 8-11,2003 PDE in hypercube space. After we find the maximum likelihood point s∗, for which the PDE is not needed, we transform the variable c c′, such that the distribution P(c′ s∗) is flat and 0→< c′ < 1. The | hypercube transformation can be made even if c is multi-dimensional by initially going to a set of variables that are uncorrelated and then making the hypercube transformation. The transformation can be suchthatanyintervalincspacemapsontotheinter- val (0,1) in hypercube space. We solve the boundary problembyimposingperiodicityinthehypercube. In the one dimensional case, we imagine three “hypercubes”, each identical to the other on the real axis in the intervals ( 1,0), (0,1) and (1,2). The hyper- − cubeofinterestistheoneinthe interval(0,1). When the probabilityfromaneventkernelleaks outside the boundary(0,1),wecontinuethekerneltothenexthy- percube. Since the hypercubes are identical, this im- Figure 4: The distribution of thenegative log likelihood ratio NLLR for the nullhypothesis for an ensemble of plies the kernelre-appearingin the middle hypercube 500 experiments each with 1000 events,as a function of butfromtheoppositeboundary. Putmathematically, thesmoothing factor h=0.1, 0.2 and 0.3 the kernel is defined such that (c′ c′)= (c′ c′ 1); c′ >1 (13) G − i G − i− that it is also easier to arriveat an analytic theory of (c′ c′)= (c′ c′ +1); c′ <0 (14) G − i G − i with the choice of this simple kernel. NLLR AlthoughaGaussianKernelwillworkonthehyper- cube, the natural kernel to use considering the shape of the hypercube would be the function (c′) G 1 h (c′)= ; c′ < (15) G h | | 2 h (c′)=0; c′ > (16) G | | 2 Thiskernelwouldbesubjecttotheperiodicboundary conditions given above, which further ensure that every event in hypercube space is treated exactly as ev- eryothereventirrespectiveoftheirco-ordinates. The parameterhisasmoothingparameterwhichneedsto be chosen with some care. However, since the theory distribution is flat in hypercube space, the smoothing parameter may not need to be iteratively determined over hypercube space to the extent that data distribution is similar to the theory distribution. Even if Figure 5: The distribution of thenegative log likelihood iterationisused,thevariationinhinhypercubespace ratio NLLR for the nullhypothesis for an ensemble of is likely to be much smaller. 500 experiments each with thesmoothing factor h=0.2, Figure 4 shows the distribution of the for as a function of thenumberof events NLLR thenullhypothesisforanensembleof500experiments each with 1000 events as a function of the smoothing factor h. It can be seen that the distribution narrows considerably as the smoothing factor increases. We 6. End of Bayesianism? choose an operating value of 0.2 for h and study the dependenceofthe asafunctionofthenumber By Bayesianism, we mean the practice of “guess- NLLR ofeventsrangingfrom100to1000events,asshownin ing” a prior distribution and introducing it into the figure 5. The dependence on the number of events is calculations. In what follows we will show that what seentobe weak,indicatinggoodbehavior. ThePDE is conventionally thought of as a Bayesian prior dis- thusarrivedcomputedwithh=0.2canbetransformed tributionisinrealityanumberthatcanbecalculated from the hypercube space to c space and will repro- from the data. We are able to do this since we use ducedatasmoothlyandwithnoedgeeffects. Wenote two pdf’s, one for theory and one for data. In what MOCT003 PHYSTAT2003, SLAC, Stanford, California, September 8-11,2003 5 follows, we will interpret the probability distribution taken to be the maximum likelihood point of the pdf ofthe parameters in a strictly frequentistsense. The (s). It may coincide with the mean value of the n P pdf ofsisthedistributionofthebestestimatorofthe pdf (s). These statements areassertionsofthe un- n P truevalues ofsfromanensembleofaninfinitenum- biased nature of the data from the experiment. At T ber of identical experiments with the same statistical this point, there is no information available on where power n. the true value s lies. One can make the hypothesis T that a particular value of s is the true value and the probability of obtaining a best estimator s∗ from ex- 6.1. Calculation of fitted errors periments of the type being performed in the interval s and s +ds is (s )ds . The actual value of T T T n T T After the fitting is done and the goodness of fit is P this number is a function of the experimental resolu- evaluated, one needs to work out the errors on the tion and the statistics n of the experiment. The joint afidbtetnoesudittyqtuhPaen(msti|atcixneis)m.,uOwmnheilcihkneelceiadhrsorotieodscpiaonlifcnoutrlmast∗ae,tftirohonempnoaoststieonrngiollyer pgirvoebnabbiylittyhePptrhoeodruyc(ts,ocfnt)hefropmrobtahbeiltihtyeodreentisciatyl eonfdthies pdf of s at the true value of s, namely (s ), and n T experiment, but how such a measurement is likely to P thetheoreticallikelihoodP(c s)evaluatedatthetrue n fluctuateifwerepeattheexperiment. Thejointprob- | value, which by our hypothesis is s. ability density P(s,cn) of observing the parameter s and the data cn is given by Ptheor(s,cn)=Ptheor(cn s) n(sT) (19) | P The joint probability P(s,cn) is a joint distribution Pdata(s,cn)=P(scn)Pdata(cn) (17) of the theoretical parameter s and data cn. The two | ways of evaluating this (from the theoretical end and where we use the superscript data to distinguish the the data end) must yield the same result, for consis- jointprobabilityPdata(s,cn)ashavingcomefromus- tency. This is equivalentto equatingPdata(s,cn) and ing the data pdf. If we nowintegrate the aboveequa- Ptheor(s,cn). This gives the equation tion over all possible datasets cn, we get the expres- P(scn)Pdata(cn)=Ptheor(cn s) n(sT) (20) sion for the pdf of s. | | P whichisaformofBayes’theorem,butwithtwopdf′s Pn(s)=Z Pdata(s,cn)dcn =Z P(s|cn)Pdata(cn)dcn (ttiohneocraynabnedimdamtae)d.iaLteeltyurse-nwortietttehnaatsthaeliakbeloihveooedquraa-- (18) tio where we have used the symbol to distinguish the fact that it is the true pdf of s oPbtained from an in- = P(s|cn) = Ptheor(cn|s) (21) finite ensemble. We use the subscript n in Pn(s) to LR Pn(sT) Pdata(cn) denote that the pdf is obtained from an ensemble of which is what is used to obtain the goodness of fit. experiments with n events each. Later on we will In order to get the fitted errors, we need to evaluate show that Pn(s) is indeed dependent on n. Equa- P(scn) which necessitates a better understanding of tion 18 states that in order to obtain the pdf of the | what (s ) is in equation 20. Rearranging equa- n T parameters,oneneedstoaddtogethertheconditional P tion 20, one gets probabilitiesP(scn)overanensembleofevents,each sPudcahtad(cisnt)r.ibAuttiotnh|iswsetigahgteedofbtyhethdeisc“udsastioanl,iktehleihfouondc”- P(s|cn)=LR(s)Pn(sT)= PPthdeaotra((ccnn|)s)Pn(sT) tions Pdata(scn) are unknown functions. We have (22) | howeverworkedout (s)asafunctionofsandhave R evaluated the maximLum likelihood value s∗ of s. We 6.1.1.ToshowthatPn(sT)dependsonn can choose an arbitrary value of s and evaluate the Inpractice,inboththebinnedandunbinnedcases, goodness of fit at that value using the likelihood ra- one only has an approximationto Pdata(cn). As n tio. When we choose an arbitrary value of s, we are → , in the absence of experimental bias, one expects in fact hypothesizing that the true value s is at this ∞ T to determine the parameter sets to infinite accuracy; valueofs. LR(s)thengivesusawayofevaluatingthe and P(scn) δ(s sT), where sT is the true value relativegoodnessoffitofthe hypothesisaswechange | → − of s. However, for the null hypothesis, as n , s. Let us now take an arbitrary value of s and hy- → ∞ the statistical error introduced by our use of PDE in pothesize that that is the true value. Then the joint the unbinned case or by binning in the binned case probabilityofobservingcn andsT beingatthisvalue becomesnegligiblewiththeresultthatthetheorypdf of s is given from the data end by equation 17. describes the data for all c at the true value s . i.e. T Similarly, from the theoretical end, one can calculate the joint probability of observing the dataset cn, Ptheor(csT) | 1asn (23) with the true value being at s. The true value sT is Pdata(c) → →∞ MOCT003 6 PHYSTAT2003, SLAC, Stanford, California, September 8-11,2003 When one evaluates the likelihood ratio over n R L events, with n , the likelihood ratio does not → ∞ necessarily remain unity. This is due to fluctuations inthedatawhichgrowas (n). Forthebinnedlikeli- hoodcasewithn bins,onpe canshow thatasn , b →∞ R e− ii==n1bχ2i/2 e−nb/2 (24) L → P → This is just an example of the likelihood ratio theorem. If one uses a binned χ2 fit, which can also be thought of as maximizing a likelihood ratio, one gets the same limit as when using binned likelihood fits. The point is that is finite as n . In the R L → ∞ unbinned case, we have currently no analytic theory available. However, one can argue that the binned case with the number of bins n and n << n b b → ∞ should approachthe unbinned limit. In this case, the unbinned also is finite for infinite statistics. This Figure 6: Comparison of theusage of Bayesian priors R L withthenewmethod. Intheupperfigure,illustratingthe implies that (s ) as n . i.e (s ) n T n T P → ∞ → ∞ P Bayesian method, an unknowndistribution is guessed at depends on n. This puts an end to the notion of a bytheuser based on “degrees of belief” and the valueof monolithic Bayesian prior interpretation for (s). Pn theBayesian prior changes as thevariable s changes. In thelower figure,an “unknown concomitant” distribution 6.1.2.ToshowthatPn(sT)isconstantwithrespecttos is used whose shape dependson the statistics. In the When one varies the likelihoodratioin equation22 case of nobias, this distribution peaks at thetrue value of s. As we change s,we change our hypothesis as to as a function of s, for each value of s, one is mak- where thetruevalue of s lies, and thedistribution shifts ing a hypothesis that s = s . As one changes s, a T with s as explained in the text. The value of the newhypothesisisbeingtestedthatismutually exclu- distribution at thetrue valueis thusindependent of s. sive from the previous one, since the true value can only be at one location. So as one changes s, one is free to move the distribution (s) so that s is at Pn T data set cn. This is our measurement of αn and dif- the value of s being tested. This implies that (s ) Pn T ferentdatasetswillgivedifferentvaluesofαn,inother does not change as one changes s and is a constant wordsα willhaveasamplingdistributionwithanex- n wrt s, which we can now write as α . Figure 6 illus- n pected value and standard deviation. As n , the trates these points graphically. Thus (s ) in our →∞ Pn T likelihood ratio R will tend to a finite value at the equations is a number, not a function. The distri- L true value and zero for all other values, and α n bution (s) should not be thought of as a “prior” →∞ Pn as a result. but asan“unknownconcomitant”,whichdepends on Note that it is only possible to write down an ex- the statistics and the measurementcapabilities of the pression for α dimensionally when a likelihood ratio n apparatus. For a given apparatus, there are a denu- is available. This leads to R merable infinity of such distributions, one for each n. L TanhdesPend(sisTtr)ib→ut∞ionassbnec→om∞e.narrower as n increases P(s|cn)= LRRds = PP((ccnn|ss))ds (27) L | R R The last equality in equation 27 is the same expres- 6.2. New form of equations sionthat“frequentists”useforcalculatingtheirerrors Equation 22 can now be re-written after fitting, namely the likelihood curve normalized to unity gives the parameter errors. If the likelihood P(s|cn)= PP(dcanta|s(c)αnn) (25) coufrnveegaistiGvealuosgs-ilainkeslihhaopoeddo,fth12efnrotmhisthjuesotpifitiemsuamchpaonignet to get the 1σ errors. Evenif it is not Gaussian,as we Since P(scn) must normalize to unity, one gets for show in section (8), we may use the expression for | αn, P(scn) as a pdf of the parameter s to evaluate the | errors. Pdata(cn) 1 The normalization condition α = = (26) n P(cn s)ds R(s) ds | L We have thus deRtermined αn, tRhe value of the “un- P(cn)=Z Ptheory(s,cn)ds=Z P(cn|s)Pn(sT)ds known concomitant” at the true value s using our (28) T MOCT003 PHYSTAT2003, SLAC, Stanford, California, September 8-11,2003 7 is obeyed by our solution, since Equation 18 gives the prescription for arriving at (s), given an ensemble of such experiments, the n P Z P(cn|s)Pn(sT) ds=Z αnP(cn|s) ds≡Pdata(cn) cboynttrhiebu“tdioantafrliokmeliheoaochd”exPpdeartiam(cenn)t fboerinthgatweeixgphetreid- (29) ment. The “data likelihoods” integrate to unity, i.e The expression αnP(cn|s) ds in the above equation Pdata(cn)dcn = 1. In the case of only a single ex- may be thoughtRof as being due to an “unknowncon- Rperiment,withtheobservedcnbeingdenotedbyconbs, comitant” whose peak value is distributed uniformly in s space. The likelihoods of the theoretical prediction P(cn s) contribute with equal probability each | with a weight αn, to sum up to form the data like- Pdata(cn)=δ(cn cnobs) (32) lihood Pdata(cn). i.e. the data, due to its statistical − inaccuracywillentertainarangeoftheoreticalparam- eters. However,equation 29 does not give us any fur- Equation 18, for a single experiment, then reduces to therinformation,sinceitisobeyedidentically. Fitting for the maximum likelihood value s∗ of s is attained by maximizing the likelihood ratio LR = PPda(ctan(|csn)). Pn(s)=Z P(s|cn)Pdata(cn)dcn =P(s|cnobs) (33) The goodness of fit is obtained using the value of R L at the maximum likelihood point. The best theoret- i.e. given a single experiment, the best estimator for ical prediction is P(cs∗), and this prediction is used to compare to the da|ta pdf Pdata(c). Note that the Pesnti(ms)a,ttohrefoprdfthoeftsru,eisvaPlu(se|csnobiss)sa∗onbds dtheudsuctehdefbroemst maximum likelihood value s is also the same point at T whichthe posteriordensityP(sc)peaks. This istrue theexperiment. WecanthususeP(s|cnobs)asthough | itis the pdf ofs anddeduce limits anderrorsfromit. onlyinourmethod. WhenanarbitraryBayesianprior The proviso is of course that these limits and errors isused,themaximumlikelihoodvalueisnotthesame as well as s∗obs come from a single experiment of fi- point at which the posterior density will peak. Note nite statistics and as such are subject to statistical also that the normalization equation (s) ds=1 is n P fluctuations. still valid. The integral R α ds=1 (30) Z n 6 since α is our measurement of the value of (s) 9. Comparison with the Bayesian n n P at the true value. It is a measure of the statistcal approach accuracy of the experiment. The larger the value of α , the narrowerthe distribution (s) andthe more n n P accurate the experiment. In the Bayesian approach, an unknown Bayesian prior P(s) is assumed for the distribution of the pa- 7. Combining Results of Experiments rameter s in the absence of any data. The shape of the prior is guessedat, basedonsubjective criteria or Each experiment should publish a likelihood curve using other objective pieces of information. However, for its fit as well as a number for the data likelihood such a shape is not invariant under transformation of Pdata(cn). Combining the results of two experiments variables. For example, if we assume that the prior with m and n experiments each, involves multiplying P(s)is flat ins, then ifwe analyze the problemins2, the likelihood ratios. we cannot assume it is flat in s2. This feature of the Bayesian approach has caused controversy. Also, the P(cm s) P(cn s) (s)= (s) (s)= | | notion of a pdf of the data does not exist and P(c) LRm+n LRm ×LRn Pdata(cm)×Pdata(cn) is taken to be a normalization constant. As such, no (31) goodness of fit criteria exist. In the method outlined Posteriordensitiesandgoodnessoffitcanbededuced here,we haveusedBayes’theoremtocalculateposte- from the combined likelihood ratio. riordensitiesofthefittedparameterswhilebeingable to compute the goodness of fit. The formalism devel- oped here shows that what is conventionally thought 8. Interpreting the results of one of as a Bayesian prior distribution is in fact a nor- experiment malizationconstantandwhat Bayesiansthink of as a normalization constant is in fact the pdf of the data. Afterperformingasingleexperimentwithnevents, Table II outlines the major differences between the we now can calculate P(scn), using equation 27. Bayesianapproach and the new one. | MOCT003 8 PHYSTAT2003, SLAC, Stanford, California, September 8-11,2003 This scheme involves the usage of two pdf’s, namely Table II The keypoints of differencebetween the data and theory. In the process of computing the fit- Bayesian method and the newmethod. ted errors, we have demonstrated that the quantity Item Bayesian Method New Method inthe joint probabilityequations that has beeninter- Goodness Absent Now available preted as the “Bayesian prior” is in reality a number of fit in both binned and not a distribution. This number is the value of and unbinnedfits thepdf oftheparameter,whichwecallthe“unknown concomitant”atthetruevalueoftheparameter. This Data Used in evaluating Used in evaluating number is calculated from a combination of data and theory pdf theory pdf theory and is seen to be an irrelevant parameter. If at data points at data points this viewpoint is accepted, the controversial practice as well as evaluating ofguessingdistributions for the “BayesianPrior”can data pdf at data points now be abandoned, as can be the terms “Bayesian” Prior Is a distribution Noprior needed. and “frequentist”. We show how to use the posterior that is guessed based Onecalculates a density to rigorously calculate fitted errors. on “degrees of belief” constant from data Independentof data, αn = Pdata(cn) P(cn|s)ds monolithic →∞ aRs n→∞ Acknowledgments Posterior Dependson Prior. Independentof prior. density same as frequentistsuse P(s|cn) P(cn|s)P(s) P(cn|s) This work is supported by Department of Energy. P(cn|s)P(s) ds P(cn|s) ds The author wishes to thank Jim Linnemann and Igor R R Volobouev for useful comments. 10. Further work to be done Equation 18 can be used to show that the expecta- References tion value of E(s) of the parameter s is given by E(s)=Z sPn(s)ds=Z dcnP(cn)Z sP(s|cn)ds(34) [1] UKn.biKninneodshMitaax,im“uEmvaLluikaetilnihgooQdufiatltiitnyg”o,fPrFoicteedin- ings of the Conference on Advanced Statistical =Z s¯(cn)P(cn)dcn(35) Techniques in Particle Physics, Durham, March 2002 IPPP/02/39,DCPT/02/78. where s¯(cn) is the average of s for individual experi- B. Yabsley,“Statistical Practice at the BELLE ments. Equation 35 states E(s) is the weighted aver- Experiment, and some questions”,ibid. age of s¯(cn) obtained from individual measurements, R. D. Cousins,“Conference Summary”, ibid. the weight for each experiment being the “data like- [2] R.A.Fisher,“Onthemathematicalfoundationsof lihood” P(cn) for that experiment. In the absence of theoretical statistics”, Philos. Trans. R. Soc. Lon- experimentalbias,E(s)wouldbeidenticaltothetrue don Ser. A 222, 309-368(1922); values . Itremainstobeshownthattheweightedav- R. A. Fisher,“Theory of statistical estimation”, T erage of maximum likelihood values s∗ from indiviual Proc. Cambridge Philos. Soc. 22, 700-725(1925). experimentsalsoconvergeto themaximumlikelihood [3] “A measure of the goodness of fit in unbinned point of (s). likelihood fits”, R .Raja, long write-up, n P Also one needs to developananalytic theoryof the http://www-conf.slac.stanford.edu/phystat2003/talks goodness of fit for unbinned likelihood fits. Finally, /raja/Raja bayes maxlike.pdf one needs to investigate a bit more closely the trans- [4] “End of Bayesianism?”, R.Raja, formation properties of (s) under change of vari- http://www-conf.slac.stanford.edu/phystat2003/talks/raja/raja-end n P able. [5] E.Parzen,“Onestimationofaprobabilitydensity function and mode” Ann.Math.Statis. 32, 1065- 1072 (1962). 11. Conclusions [6] D. Scott. Multivariate Density Estimation. John Wiley & Sons, 1992. To conclude, we have proposed a scheme for ob- M. Wand and M.Jones,Kernel Smoothing. Chap- taining the goodnessof fit in unbinned likelihoodfits. man & Hall, 1995. MOCT003

A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism? PDF

0.36 MB·English

by Rajendran Raja

#additional_collections #journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Download A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism? PDF Free - Full Version

by Rajendran Raja| 0.36| English

Download A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism? by Rajendran Raja in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism?

No description available for this book.

Detailed Information

Author:	Rajendran Raja
Language:	English
File Size:	0.36
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism? Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism? PDF?

Yes, on https://PDFdrive.to you can download A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism? by Rajendran Raja completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism? on my mobile device?

After downloading A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism? PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism??

Yes, this is the complete PDF version of A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism? by Rajendran Raja. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism? PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.