Table Of ContentPHYSTAT2003, Stanford Linear Accelerator Center, September 2003 1
Asymmetric Errors
RogerBarlow
ManchesterUniversity,UKandStanfordUniversity,USA
Errors quoted on results are often given in asymmetric form. An account is given of the two ways these can
ariseinananalysis,andthecombinationofasymmetricerrorsisdiscussed. Itisshownthat theusualmethod
4 hasnobasisandisindeedwrong. Forasymmetricsystematicerrors,aconsistentmethodisgiven,withdetailed
0 examples. Forasymmetricstatisticalerrorsageneral approachisoutlined.
0
2
1. Asymmetric Errors 2. Asymmetric Systematic Errors
n
a
J In the reporting of results from particle physics ex- Ifσ andσ+aredifferentthenthisisasignthatthe
x− x
0 periments it is common to see values given with er- dependence ofxona is non-linearandthe symmetric
1 rors with different positive and negative numbers, to distributioninagivesanasymmetricdistributioninx.
denote a 68% central confidence region which is not Inpractice,ifthe difference isnotlarge,onemightbe
]
n symmetric about the central estimate. For example welladvisedtoassumeastraightlinedependenceand
a (one of many) the Particle Data Group[1] quote take the error as symmetric, however we will assume
a- B.R.(f (1270) ππ)=(84.7+2.4)%. that this is not a case where this is appropriate. We
t 2 → −1.3 consider cases where a non-linear effect is not small
a Thepurposeofthisnoteistodescribehowsucher- enoughtobeignoredentirely,butnotlargeenoughto
d
rorsariseandhowthey canproperlybehandled,par- justify a long and intensive investigation. Such cases
.
s ticularly when two contributions are combined. Cur- are common enough in practice.
c
rent practice is to combine such errorsseparately, i.e.
i
s toaddtheσ+ valuestogetherinquadrature,andthen
y 2.1. Models
do the same thing for the σ values. This is not, to
h −
my knowledge, documented anywhere and, as will be
p
For simplicity we transform a to the variable u de-
[ shown, is certainly wrong.
scribed by a unit Gaussian, and work with X(u) =
Therearetwoseparatesourcesofasymmetry,which
1 x(u) x(0). It is useful to define the mean σ, the
unfortunately require different treatments. We call −
v difference α, and the asymmetry A:
these ‘statistical’ and ‘systematic’; the label is fairly
2
4 accurate though not entirely so, and they could σ++σ σ+ σ σ+ σ
− − −
0 equally well be called ‘frequentist’ and ‘Bayesian’. σ = α= − A= −
2 2 σ++σ
1 Asymmetric statistical errors arise when the log −
(1)
0 likelihoodcurveisnotwelldescribedbyaparabola[2].
There are infinitely many non-linear relationships be-
4
The one sigma values (or, equivalently, the 68% cen-
tween u and X that will go through the three deter-
0
tral confidence level interval limits) are read off the
/ mined points. We consider two. We make no claim
s points at which lnL falls from its peak by 1 –
c 2 that either of these is ‘correct’. But working with
i or, equivalently, when χ2 rises by 1. This is not asymmetric errors must involve some model of the
s strictly accurate, and corrections should be made us- non-linearity. Practitioners must select one of these
y
h ing Bartlett functions[3], but that lies beyond the twomodels,orsomeother(towhichthesameformal-
p scope of this note. ism can be applied), on the basis of their knowledge
: Asymmetric systematic errors arise when the de- of the problem, their preference and experience.
v
pendence of a result on a ‘nuisance parameter’ is
i
X non-linear. Because the dependence on such parame- Model 1: Two straight lines
•
r ters–theoreticalvalues,experimentalcalibrationcon- Two straight lines are drawn, meeting at the
a stants,andsoforth–isgenerallycomplicated,involv-
central value
ing Monte Carlo simulation, this study generally has
to be performed by evaluating the result x at the σ X =σ+u u 0
− ≥
and+σ valuesofthenuisanceparametera(see[4]for
=σ u u 0. (2)
−
a fuller account) giving σ and σ+. (a σ gives σ ≤
or σ according to the sigx−n of dx.x) ± x±
x∓ da Model 2: A quadratic function
This note summarises a full account of the proce- •
The parabola through the three points is
dure for asymmetric systematic errors which can be
foundin[5]anddescribeswhathassubsequentlybeen
achievedforasymmetricstatisticalerrors. Foranother
critical account see [6]. X =σu+αu2 =σu+Aσu2. (3)
WEMT002
2 PHYSTAT2003, Stanford Linear Accelerator Center, September 2003
TheseformsareshowninFigure1forasmallasym- ItcanbeseenthattheModel1dimidatedGaussian
metry of 0.1, and a larger asymmetry of 0.4. and Model 2 distorted Gaussian are not dissimilar if
the asymmetry is small, but are very different if the
asymmetry is large.
2.2. Bias
If a nuisance parameter u is distributed with a
Gaussian probability distribution, and the quantity
X(u) is a nonlinear function of u, then the expecta-
tion X is not X( u ).
h i h i
For model 1 one has
σ+ σ
−
<X >= − (4)
√2π
For model 2 one has
Figure 1: Some nonlinear dependencies σ+ σ−
<X >= − =α (5)
2
Model 1 is shown as a solid line, and Model 2 is Henceinthesemodels,(oranyothers),ifthe result
dashed. Both go through the 3 specified points. The quoted is X(0), it is not the mean. It differs from
differencesbetweenthemwithintherange 1 u 1 it by an amount of the order of the difference in the
− ≤ ≤
are not large; outside that range they diverge consid- positive and negative errors. It is perhaps defensible
erably. as a number to quote as the result as it is still the
The distribution in u is a unit Gaussian, G(u), median - there is a 50% chance that the true value is
and the distribution in X is obtained from P(X) = below it and a 50% chance that it is above.
G(u) . Examples are shown in Figure 2. For Model
dX/du
| |
1 (again a solid line) this gives a dimidated Gaus- 2.3. Adding Errors
sian - two Gaussians with different standard devia-
tionfor X >0andX <0. This is sometimes calleda If a derived quantity z contains parts from two
‘bifurcated Gaussian’, but this is inaccurate. ‘Bifur- quantitiesx andy, sothat z =x+y,the distribution
cated’means‘split’inthesenseofforked. ‘Dimidated’ in z is given by the convolution:
means ‘cut in half’, with the subsidiary meaning of
‘having one part much smaller than the other’ [7].
For Model 2 (dashed) with small asymmetries the f (z)= dxf (x)f (z x) (6)
z x y
curve is a distorted Gaussian, given by G(u) with Z −
σ+2αu
| |
u = √σ2+4αX σ. For larger asymmetries and/or
2α −
larger X values, the second root also has to be con-
| |
sidered.
Figure 3: Examples of the distributions from combined
asymmetric errors using Model 1.
With Model 1 the convolution can be done ana-
Figure 2: Probability Density Functions from Figure 1
lytically. Some results for typical cases are shown in
WEMT002
PHYSTAT2003, Stanford Linear Accelerator Center, September 2003 3
Figure 3. The solid line shows the convolution, the +1(σ+ σ )3 (8)
π − −
dashed line is obtained by adding the positive and
(cid:3)
negativestandarddeviationsseparatelyinquadrature
GivenseveralerrorcontributionstheEquations8give
(the ‘usual procedure’). The dotted line is described
the cumulants µ, V and γ of each. Adding these up
later.
gives the first three cumulants of the combined dis-
Thesolidanddashedcurvesdisagreemarkedly. The
tribution. Then one can find the set of parameters
‘usualprocedure’curvehasalargerskewthanthecon- σ ,σ+,x which give these values by using Equa-
− 0
volution. Thisisobvious. Iftwodistributionswiththe
tions 8 in the other sense.
same asymmetry are added the ‘usual procedure’will
It is convenient to work with ∆, where ∆ is the
give a distribution just scaled by √2, with the same
difference between the final x and the sum of the in-
asymmetry. This violates the Central Limit Theo- 0
dividual ones. The parameter is needed because of
rem, which says that convoluting identical distribu-
the bias mentioned earlier. Even though each contri-
tions must result in a combined distribution which is
bution may have x = 0, i.e. it describes a spread
more Gaussian, and therefore more symmetric, than 0
about the quoted result, it has non-zero µ through
itscomponents. Thisshowsthatthe‘usualprocedure’ i
the bias effect (c.f. Equations 4 and 5 ). The σ+
for adding asymmetric errors is inconsistent.
and σ of the combined distribution, obtained from
−
the total V and γ, will in generalnot give the right µ
2.4. A consistent addition technique unless a location shift ∆ is added. The value of the
quoted result will shift.
Ifadistributionforxisdescribedbysomefunction,
RecallingsectionB,fortheoriginaldistributionone
f(x;x ,σ+,σ ), which is a Gaussian transformed ac-
0 − could defend quoting the central value as it was the
cording to Model 1 or Model 2 or anything else, then
median, even though it was not the mean. The con-
‘combination of errors’ involves a convolution of two
voluted distribution not only has a non-zero mean, it
such functions according to Equation 6. This com-
also (as can be seen in Figure 3 ) has non-zero me-
binedfunctionisnotnecessarilyafunctionofthesame
dian. If you want to combine asymmetric errors then
form: itisaspecialpropertyofthe Gaussianthatthe
youhavetoacceptthatthequotedvaluewillshift. To
convolutionoftwoGaussiansgivesathird. The(solid
makethiscorrectionrequiresarealbeliefintheasym-
line) convolution of two dimidated Gaussians is not
metry of the errorvalues. At this point practitioners,
itself a dimidated Gaussian. Figure 3 is a demonstra-
unless they are sure that their errorsreally do have a
tion of this.
significantasymmetry,may be persuadedto revertto
Although the form of the function is changed by
quoting symmetric errors.
a convolution, some things are preserved. The semi-
Solving the Equations 8 for σ ,σ+ and x
invariant cumulants of Thi`ele (the coefficients of the − 0
given µ, V and γ has to be done numer-
powerseriesexpansionofthelogoftheFourierTrans-
ically. A program for this is available on
form) add under convolution. The first two of these
http://www.slac.stanford.edu/ barlow. Some
are the usual mean and variance. The third is the
∼
results are shown in the dotted curve of Figure 3 and
unnormalised skew:
Table 1.
γ =<x3 > 3<x><x2 >+2<x>3 (7)
−
Within the context of any model, a consistent ap-
Table I Addingerrors in Model 1
proach to the combination of errors is to find the
σ− σ+ σ− σ+ σ− σ+ ∆
mean, variance and skew: µ, V and γ, for each con- x x y y
1.0 1.0 0.8 1.2 1.32 1.52 0.08
tributing function separately. Adding these up gives
the mean, variance and skew of the combined func- 0.8 1.2 0.8 1.2 1.22 1.61 0.16
tion. Working within the model one then determines 0.5 1.5 0.8 1.2 1.09 1.78 0.28
thevaluesofσ ,σ ,andx thatgivethismean,vari- 0.5 1.5 0.5 1.5 0.97 1.93 0.41
+ 0
ance and skew−.
2.5. Model 1
It is apparent that the dotted curve agrees much
For Model 1, for which x3 = 2 (σ3 σ3) we better with the solid one than the ‘usual procedure’
have h i √2π + − − dashed curve does. It is not an exact match, but
does an acceptable job given that there are only 3
µ=x0+ √12π(σ+−σ−) adjustable parameters in the function. If the shape
V =σ2+α2 1 2 of the solid curve is to be represented by a dimidated
− π Gaussian,then it is plausible that the dotted curve is
γ = √12π 2(σ+3−σ−3)− 32((cid:0)σ+−σ(cid:1)−)(σ+2+σ−2) the ‘best’ such representation.
(cid:2)
WEMT002
4 PHYSTAT2003, Stanford Linear Accelerator Center, September 2003
2.6. Model 2 For Model 2 one has
The equivalent of Equations 8 are δ =σu+Aσu2. (10)
This can be considered as a quadratic for u with
µ=x +α
0 solution which when squared gives u2, the χ2 contri-
V =σ2+2α2 bution, as
γ =6σ2α+8α3 (9)
2+4Aδ 2(1+4Aδ)12
u2 = σ − σ (11)
As with Method 1, these are used to find the cu- 4A2
mulants of each contributing distribution, which are This is not really exact, in that it only takes one
summedtogivethe threetotals,andthenEquation9 branch of the solution, the one approximating to the
is used again to find the parameters of the distorted straight line, and does not consider the extra possi-
Gaussianwiththismean,varianceandskew. Theweb bility that the δ value could come from an improb-
program will also do these calculations able u value the other side of the turning point of
Some results are shown in Figure 4 and Table II. the parabola. Given this imperfection it makes sense
The true convolutioncannot be done analytically but to expand the square root as a Taylor series, which,
can be done by a Monte Carlo calculation. neglecting correction terms above the second power,
leads to
Table II Addingerrors in Model 2 δ δ δ
χ2 =( )2 1 2A( )+5A2( )2 . (12)
σ− σ+ σ− σ+ σ− σ+ ∆ σ − σ σ
x x y y (cid:18) (cid:19)
1.0 1.0 0.8 1.2 1.33 1.54 0.10 This provides a sensible form for χ2 from asym-
0.8 1.2 0.8 1.2 1.25 1.64 0.20 metric errors. It is important to keep the δ4 term
0.5 1.5 0.8 1.2 1.12 1.88 0.35 rather than stopping at δ3 to ensure χ2 stays posi-
0.5 1.5 0.5 1.5 1.13 2.07 0.53 tive! Adding higher orders does not have a great ef-
fect. We recommend it for consideration when it is
required(e.g. infitting partondistributionfunctions)
to form a χ2 from asymmetric errors
2.8. Weighted means
The ‘best’ estimate (i.e. unbiassed and with small-
est variance) from several measurements x with dif-
i
ferent (symmetric) errors σ is given by a weighted
i
sum with w = 1/σ2. We wish to find the equivalent
i i
for asymmetric errors.
As noted earlier,when sampling fromanasymmet-
ric distribution the result is biassed towards the tail.
The expectation value x is not the location param-
h i
eter x. So for an unbiassed estimator one must take
xˆ= w (x b )/ w (13)
i i i i
−
Figure 4: Examples of combined errors using Model 2. where X X
σ+ σ
−
Again the true curves (solid) are not well repro- b= − (Model 1) b=α (Model 2)
√2π
duced by the ‘usual procedure’ (dashed) but the
(14)
curveswith the correctcumulants (dotted) do a good
The variance of this is given by
job. (The sharp behaviour at the edge of the curves
is due to the turning point of the parabola.) V = wi2Vi (15)
( w )2
P i
2.7. Evaluating χ2 whereV isthevarianceofPtheith measurementabout
i
its mean. Differentiating with respect to w to find
i
ForModel1theχ2 contributionfromadiscrepancy the minimum gives
δ is just δ2/σ+2 or δ2/σ 2 as appropriate. This is
manifestly inelegant, espe−cially for minimisation pro- 2wiVi 2 wj2Vj
=0 i (16)
cedures as the value goes through zero. ( w )2 − ( w )3 ∀
j P j
P P
WEMT002
PHYSTAT2003, Stanford Linear Accelerator Center, September 2003 5
whichissatisfiedby w =1/V . This isthe equivalent can be rewritten
i i
of the familiar weighting by 1/σ2. The weights are
2 2
given, depending on the Model, by (see Equations 8 1 aˆ+ˆb (a+b) 1 aˆ ˆb (a b)
− + − − − (19)
and 9)
2 σ 2 σ
! !
2
V =σ2+(1 )α2 or V =σ2+2α2 (17) so the likelihood is the product of Gaussians for u =
− π a+b and v =a b, with standard deviations √2σ.
−
Picking a particular value of v, one can then triv-
Note that this is not the Maximum Liklelihood es-
ially construct the 68% confidence region for u as
timator - writing down the likelihood in terms of the
[uˆ √2σ,uˆ+√2σ]. Picking another value of v, in-
χ2 and differentiating does not give a nice form - so −
deedanyothervalueofv,oneobtainsthesameregion
in principle there may be better estimators, but they
for u. We can therefore say with 68% confidence that
will not have the simple form of a weighted sum.
these limits enclose the true value of u, whatever the
valueofv. Theuninterestingpartofaandbhasbeen
‘parametrised away’. This is, of course, the standard
3. Asymmetric Statistical Errors resultfromthecombinationoferrorsformula,butde-
rived in a frequentist way using Neyman-style confi-
denceintervals. Wecouldconstructthelimitsonuby
Asexplainedearlier,(log)likelihoodcurvesareused
findinguˆ+σ+ suchthattheintegratedprobabilityofa
to obtain the maximum likelihood estimate for a pa- u
resultassmallasorsmallerthanthedatabe16%,and
rameter and also the 68% central interval – taken as
similarly for σ , rather than taking the ∆lnL = 1
the values at which lnL falls by 1 from its peak. For u− −2
2 shortcut, and it would not affect the argument.
large N this curve is a parabola, but for finite N it
Thequestionnowishowtogeneralisethis. Forthis
is generally asymmetric, and the two points are not
to be possible the likelihood must factorise
equidistant about the peak.
The bias,ifany,isnotconnectedtothe formofthe
L(~xa,b)=L (~xu)L (~xv) (20)
curve, which is a likelihood and not a pdf. Evaluat- | u | v |
ing a bias is done by integrating over the measured
withasuitablechoiceoftheparametervandthefunc-
value not the theoretical parameter. We will assume
tionsL andL . Thenwecanusethesameargument:
u v
for simplicity that these estimates are bias free. This
for any value of v the limits on u are the same, de-
means that when combining errors there will be no
pending only on L (~xu). Because they are true for
u
shift of the quoted value. |
any v they are true for all v, and thus in general.
Therearecaseswherethis canclearlybe done. For
two Gaussians with σ = σ the result is the same
3.1. Combining asymmetric statistical a 6 b
as above but with v = aσ2 bσ2. For two Poisson
errors distributions v is a/b. Therbe−are caases (with multiple
peaks) where it cannot be done, but let us hope that
Suppose estimates aˆ and ˆb are obtained by this these are artificially pathological.
method for variables a and b. a could typically be Onthe basisthatifit cannotbe done,the question
an estimate of the total number of events in a sig- isunanswerable,letusassumethatitispossibleinthe
nalregion,andb the (scaledandnegated)estimate of case being studied, and see how far we can proceed.
background, obtained from a sideband. We are inter- Finding the form of v is liable to be difficult, and as
ested in u = a+b, taking uˆ = aˆ+ˆb. What are the it is not actually used in the answer we would like to
errors to be quoted on uˆ? avoid doing so. The limits on u are read off from the
∆lnL(~xu,v)= 1 pointswherevcanhaveanyvalue
| −2
provided it is fixed. Let us choose v = vˆ, the value
3.2. Likelihood functions known at the peak. This is the value of v at which Lv(v)
is a maximum. Hence when we consider any other
value of u, we can find v = vˆ by finding the point at
Wefirstconsiderthecasewherethelikelihoodfunc-
which the likelihood is a maximum, varying a b, or
tions La(~xa) and Lb(~xb) are given. −
| | a,orb,oranyothercombination,alwayskeepinga+b
ForthesymmetricGaussiancase,theansweriswell
fixed. We can read the limits off a 1 dimensional plot
known. Suppose that the likelihoods are both Gaus-
of lnL (~xu), where the ‘max’ suffix denotes that
sian,andfurtherthatσa =σb =σ. Theloglikelihood at eachmvaaxlue|of u we search the subspace to pick out
term
the maximum value.
2 This generalises to more complicated situations. If
aˆ a 2 ˆb b
u=a+b+cweagainscanthelnL (~xu)function,
− + − (18) max
(cid:18) σ (cid:19) σ ! where the subspace is now 2 dimensional|.
WEMT002
6 PHYSTAT2003, Stanford Linear Accelerator Center, September 2003
3.3. Likelihood functions not completely errors). Here a and b could be results from differ-
known ent channels or different experiments. This can be
regarded as a special case, constrained to a = b, i.e.
In many cases the likelihood functions for a and b v = 0, but this is rather contrived. It is more direct
will not be given, merely estimates aˆ andˆb and their just to say that one uses the log likelihood which is
asymmetricerrorsσa+, σa−, σb+ andσb−. All we cando thesumofthetwoseparatefunctions,anddetermines
istousethesetoprovidebestguessfunctionsL (~xa) thepeakandthe∆lnL= 1 pointsfromthat. Ifthe
a | −2
and L (~xb). A parametrisation of suitable shapes, functions are knownthis is unproblematic, ifonly the
b
which for|σ+ σ approximate to a parabola, must errors are given then the same parametrisation tech-
−
∼
be provided. Choosing a suitable parametrisation is nique can be used.
not trivial. The obvious choice of introducing small
higher-ordertermsfailsasthesedominatefarfromthe
peak. A likely candidate is:
4. Conclusions
1 ln(1+a/γ) 2
lnL(a)= (21)
−2 lnβ
(cid:18) (cid:19)
If asymmetric errrors cannot be avoided they need
whereβ =σ /σ andγ = σ+σ− . Thisdescribesthe
usual parabo+la,−but with thσ+e−xσ-−axis stretched by an careful handling.
A method is suggested and a programprovided for
amount that changes linearly with distance. Figure 5
combining asymmetric systematic errors. It is not
showstwo illustrative results. The first is the Poisson
‘rigorously correct’ but such perfection is impossible.
Unlike the usual method, it is at least open about its
assumptions and mathematically consistent.
Formulæ for χ2 and weighted sums are given.
A method is proposed for combining asymmetric
statisticalerrorsifthelikelihoodfunctionsareknown.
Work is in progressto enable it to be used givenonly
the results and their errors.
Acknowledgments
The author a gratefully acknowledges the support
of the Fulbright Foundation.
Figure 5: Approximations using Equation 21
References
likelihoodfrom5observedevents(solidline)forwhich
theestimateusingthe∆lnL= 1 pointsisµ=5+2.58, [1] D.E. Groom et al., Eur. Phys. J. C15 1 (2000).
2 1.92
as shown. The dashed line is that obtained inse−rting [2] W. T. Eadie et al, “Statistical Methods in Exper-
thesenumbersintoEquation21. Thesecondconsiders imental Physics”, North Holland, 1971.
ameasurementofx=100 10,ofwhichthelogarithm [3] A.G. Frodesen et al. “Probablity and Statistics
has been taken, to give a±value 4.605+0.095. Again, in Particle Physics”, Universitetsforlaget Bergen-
0.105
the solid line is the true curve and th−e dashed line Oslo-Tromso (1979), pp 236-239.
the parametrisation. In both cases the agreement is [4] R. J. Barlow “Systematic Errors: Facts and Fic-
excellent over the range 1σ and reasonable over tions” in Proc. Durham conference on Advanced
≈ ±
the range 3σ. Statistical Techniques in Particle Physics, M. R.
≈±
To check the correctness of the method we can use Whalley and L. Lyons (Eds). IPPP/02/39.2002.
the combination of two Poisson numbers, for which [5] R. J. Barlow, “Asymmetric System-
the result is known. First indications are that the atic Errors” preprint MAN/HEP/03/02,
errors obtained from the parametrisation are indeed ArXiv:physics/030613.
closer to the true Poisson errors than those obtained [6] G. D’Agostini “BayesianReasoning in Data Anal-
from the usual technique. ysis: a Critical Guide”, World Scientific (2003).
3.4. Combination of Results [7] The Shorter Oxford English Dictionary, Vol I (A-
M) p 190 and p 551 of the 3rd edition (1977).
A relatedproblem is to find the combined estimate
uˆ given estimates aˆ and ˆb (which have asymmetric
WEMT002