Table Of ContentAstronomy&Astrophysicsmanuscriptno.point c ESO2008
(cid:13)
February5,2008
Ly-α forest: efficient unbiased estimation of
second-order properties with missing data
7
0 R.Vio1,V.D’Odorico2,H.Stoyan3,andD.Stoyan3
0
2
1 ChipComputersConsultings.r.l.,VialeDonL.Sturzo82,S.LiberalediMarcon,I-30020Venice,Italy
n
a e-mail:[email protected]
J 2 INAF-OsservatorioAstronomicodiTrieste,viaG.B.Tiepolo11,I-34143Trieste,Italy
1 e-mail:[email protected]
3 3 InstituteofStochastics,TUBergakademieFreiberg
D-09596Freiberg,Germany
1 e-mail:[email protected]
v
5
Received.............;accepted................
9
8
1 ABSTRACT
0
7 Context.One important step in the statistical analysis of the Ly-αforest data is the study of their second order properties. Usually, this is
0 accomplished by means of the two-point correlation function or, alternatively, the K-function. In the computation of these functions it is
/ necessarytotakeintoaccountthepresenceofstrongmetallinecomplexesandstrongLy-αlinesthatcanhiddenpartoftheLy-αforestand
h
representanonnegligiblesourceofbias.
p
Aims.In this work, we show quantitatively what are the effects of the gaps introduced in the spectrum by the strong lines if they are not
-
o properlyaccountedforinthecomputationofthecorrelationproperties.Weproposeageometricmethodwhichisabletosolvethisproblem
r and iscomputationally moreefficient thantheMonte Carlo(MC) technique that istypically adopted inCosmology studies. Themethod is
t
s implementedintwodifferentalgorithms.Thefirstonepermitstoobtainexactresults,whereasthesecondoneprovidesapproximatedresults
a butiscomputationallyveryefficient.Theproposedapproachcanbeeasilyextendedtodealwiththecaseoftwoormorelistsoflinesthathave
:
v tobeanalyzedatthesametime.
i Methods.Numericalexperimentsarepresentedthatillustratetheconsequencestoneglecttheeffectsduetothestronglinesandtheexcellent
X
performancesoftheproposedapproach.
r Results.Theproposedmethodisabletoremarkablyimprovetheestimatesofboththetwo-pointcorrelationfunctionandtheK-function.
a
Keywords.Methods:dataanalysis–Methods:statistical–Quasars:absorptionlines–Cosmology:large-scalestructureofUniverse
1. Introduction 1997; Zhangetal. 1997; Theuns,Leonard&Efstathiou 1998;
Machaceketal.2000).Thus,theLy-αforestprovidesaunique
Thestudyoftheabsorptionlinesobservedintheopticalspectra
and powerful tool to study the distribution/evolutionof bary-
ofhighredshiftquasarshasgreatlyevolvedin thelastdecade
onic matter and the physical status of the IGM over a wide
thanksmainlytotheadventofthenewclassof10m-telescopes
redshiftinterval.
coupledwithhighresolution,highperformancespectrographs
and to the improvedcomprehensionof the physical nature of
theabsorbers. Inanalogywiththestudyofgalaxies,thetwo-pointcorrela-
The lines forming the so called “Ly-α forest” observed tionfunctionhasbeenclassicallyusedtostudythedistribution
blue ward of the Ly-α emission of the quasar are due to of the Ly-α lines (e.g. Sargentetal. 1980). However, the Ly-
the Ly-α transition in neutral hydrogen distributed along the αdistributioniscontaminatedintheobservedspectrabymetal
line of sight. They originate in the fluctuations of the inter- transitionsassociatedwithgalactichaloespiercedbythelineof
mediate and low density intergalactic medium (IGM), aris- sight.Thestrongermetallinescanextendforseveralangstrom
ingnaturallyinthehierarchicalprocessofstructureformation coveringweakerLy-αlinesandcausingabiasintheirdistribu-
(e.g.,Cenetal.1994;Zhangetal.1995;Hernquistetal.1996; tion.ThesameistrueforstrongLy-αabsorptions,inparticular,
Miralda-Escude´etal. 1996; Bi&Davidsen 1997; Dave´etal. LymanlimitanddampedLy-αsystems.Ontheotherhand,the
distribution at small scales is biased by the intrinsic width of
Sendoffprintrequeststo:R.Vio thelines( 25 30kms 1).
−
∼ −
2 R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction
Observers have become more and more aware of those 2. Estimationof K(r)andξ(r)inpresenceofedge
problemsin particularforhighresolutionspectra and atlarge effects
redshifts where the crowding of the lines increases (dn/dz
∝ A one-dimensional spatial point process X is any stochastic
(1 + z)2.5, e.g., Kimetal. 2002). In the most recent papers
mechanismthatgeneratesacountablesetofeventsx = x n
based on high resolution quasar spectra (e.g., Luetal. 1996; ontherealaxisR.Itiscalledstationaryifthedistribution{ oi}fi=X1
Cristianietal.1997;Kimetal.2001)thestrongLy-αlinesare
isinvariantundertranslations,thatis,thedistributionofX+s
eliminated from the computation of the two-point correlation
isthesameasthatofXforanys R.Usually,stationarypoint
functionandthelinescloserthanaminimumvelocitysepara- ∈
processes are described by their first-order and second-order
tion are merged.In orderto correctforedge effects and pres-
characteristics.Thefirst-ordercharacteristic,sayλ,equalsthe
ence of the metal gaps in the line distribution, the observed
mean number of points per unit of length and is often called
distributionof linepairsiscomparedwith MonteCarlo(MC)
intensity. The second-order characteristics are represented by
simulationsoflineswhicharerandomlydistributedintheob-
thetwo-pointcorrelationfunctionξ(r)andthepaircorrelation
served redshift interval other than the cosmological increase
functiong(r)thatsatisfy
withredshift.Thesamegapsduetothecut-outabsorptionsare
reportedalsoonthesimulatedspectraandtheminimumsepa- g(r)=1+ξ(r). (1)
rationbetweentwolinesispreserved.Thesimulatedprocessis
approximatelyofPoisson typeoversmallredshiftseparations Thefunctiong(r)isdefinedasfollows.Consideranyinfinitesi-
(seeSargentetal.1980). malintervalBoflengthdL.Theprobabilityofhavingapointin
BisλdL.ConsidernowtwosuchintervalsB andB oflengths
1 2
dL anddL andintercenterdistancer.Theprobabilitytohave
A further approximation is introduced in this method be- 1 2
apointineachintervalcanbedenotedbyP(r)andisexpressed
cause it is not the redshift split but the velocity split distri-
as
bution that is considered, where the separation ∆v between
ij
two absorptionlines of redshiftszi and zj is givenby the non P(r)=g(r)λ2dL dL . (2)
1 2
linear formula ∆v = c(z z )/[1 + (z + z )/2]. Since, if
ij i j i j
| − |
∆v > ∆v ,∆v ,itis∆v , ∆v +∆v ,thequantities∆v The factor of proportionalityg(r) is the pair correlationfunc-
13 12 23 13 12 23 ij
do not represent Euclidean distances. Hence, from the quan- tion. It is clear that, in the case of complete randomness,
tities ∆v N it is not even possible to construct the one- g(r) = 1 or ξ(r) = 0, independently of r. g(r) > 1, or
{ i,j}i,j=1
dimensional point process corresponding to the sequence of ξ(r) > 0,indicatesthatpairsofpointswithdistanceinthein-
the velocities v ,v ,...,v . The use of velocities at the place terval [r dr/2,r + dr/2] are more likely to occur than for
1 2 N
−
of comoving separations was suggested mainly by the un- a Poissonprocesswith the same intensity,thatmeansthere is
known ratio between peculiar velocity and Hubble flow con- clustering.Onthecontrary,g(r) < 1orξ(r) < 0correspondto
tributionstothemeasuredLy-αlineredshifts.Thereisnowev- inhibitionorregularity.
idencethatpeculiarvelocitiesarenegligibleforthoseabsorbers The estimation of g(r) and ξ(r) presents a practical diffi-
(e.g.,Rauchetal.2005;D’Odoricoetal.2006),whichmakesit culty. In fact, for a given distance r, the estimation of these
preferabletousecomovingdistancestocomputethetwo-point characteristicsrequiresevaluationofthenumberofpointpairs
correlationfunction. intheinfinitesimalinterval[r dr/2,r+dr/2].Sincethisisim-
−
possible,afiniteinterval[r h,r+h]hastobeused.Thisleads
−
Thecomputationofacomovingdistancecorrespondingto to approximation errors, especially for small r when it could
agivenzrequirestheevaluationofanintegralthatingeneralis happenthatr h<0,andtotheproblemofchoosingtheband-
−
notavailableinclosedform.Hence,anumericalapproachhas width h. In particular, the determination of the best value of
tobeadopted.ThiscanmaketheMCmethodsverytimecon- htousefora specificproblemisstillan”art”(Mart´ınezetal.
suming. For this reason, in this paper we presenta geometric 2005). The best recipe is to consider a series of bandwidths
methodtocomputethetwo-pointcorrelationthatdoesnotre- starting fromh 1/λandthen to comparethe results;some-
≈
quirethegenerationofauxiliaryrandomsamplesandtherefore times it is recommendable to use “adapted” bandwidths, i.e.
ismuchmoreefficient. differentvaluesofhforsmallandlarger.Inordertoavoidthis
kindofproblem,sometimesstatisticiansusealsotheso-called
K-function,K(r),thatisrelatedtoξ(r)through
InSec.2thecomputationofthetwo-pointcorrelationfunc-
tionandoftheK-functionisconsideredwhenonlyedgeeffects r
K(r)=2 [1+ξ(u)]du, (3)
are present. In Sec. 3 andAppendicesA-D the argumentsare
Z
0
generalizedto dealwiththecase ofgapsinthedatasequence
or
and two algorithmsare proposed.Two experimentaldata sets
are analyzed in Sec. 4. Finally, conclusions are presented in 1dK(r)
ξ(r)= 1. (4)
Sec.6. 2 dr −
Here, no bandwidth has to be fixed. The main drawback of
Throughoutthis paperwe will adoptfor the cosmological this approach is that K(r) is a cumulative function,which in-
parameters: Ω = 0.26, Ω = 0.74, and H = 73 km s 1 creasesnearlylinearly.(Acomparablesituationexistsinclas-
0m 0Λ 0 −
Mpc 1. sical statistics for the pair “probability distribution function”
−
R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction 3
F(x)andthe“probabilitydensityfunction” f(x).)Forthisrea- K−function
1
son, a plot of K(r) is more difficult to interpret than a plot
of ξ(r). As a consequence, ξ(r) is more useful in exploratory 0.9
statistics, whereas K(r)isusefulintestingstatisticalhypothe-
0.8
ses, e.g., that a given point sequence is “completely random”
orbelongstoaPoissonprocess. 0.7
IfapointprocesswasobservableovertheentireRdomain,
0.6
the estimationof K(r) andξ(r) couldsimplyfollowtheirdef-
initions. But in general, a pointprocesscan be observedonly K(r)0.5
inafiniteregionW definedintheinterval[x ,x ] R,the
min max ⊂ 0.4
observing window. Then edge effects play an important role,
sincetheestimationof K(r)andξ(r)shouldrequiretheuseof 0.3
pointsexternaltoW.Hence,the“natural”estimator
0.2
n n
W
K¯(r)= | | I (x x ), (5) 0.1 True
n2 r | i− j| Unbiased
Xi=1 j=X1,j,i Biased
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
with W thelengthofW and r
| |
1 ifu r Fig.1.ComparisonofthebiasedestimatorK¯(r)withtheunbi-
Ir(u)=(0 othe≤rwise, (6) asedoneK(r)withadaptedintensityλV(r).Thedatausedinthe
simulationconsistof200pointsdrawnfromaPoissonprocess
will yield results too small, with a negative bias. The same b
intheinterval[0,1].
holdsforthe“natural”estimatorofξ(r)
W2 n n
ξ¯(r)= | | k (r x x ) 1, (7) Two−point correlation function
2n2 h −| i− j| −
Xi=1 j=X1,j,i 0.1
where
1/2h if h u h 0
kh(u)=(0 oth−erw≤ise.≤ (8)
Figures1-2 showthatthementionedbiasesare reallysig- −0.1
nificant also in the one-dimensionalcase considered here. As
expected, the larger r the larger the biases. This is due to the ξ(r)−0.2
increasingnumberofpointsexternaltoW that,forincreasing
valuesofr,shouldbenecessarytoestimatebothK(r)andξ(r).
−0.3
AlthoughnotclearlyvisibleinFig.2,ξ¯(r)presentsabiasalso
whenr<h.Thisisbecauseinthecomputationofξ(r)onlythe
pointsintheinterval[0,r+h]oflengthshorterthan2hcanbe −0.4
True
considered. Unbiased
Biased
In Astronomy, especially in the field of the statisti- −0.5
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
cal analysis of the systems of galaxies, this edge-correction r
problem is well known. Various algorithms, mostly of Fig.2.Comparisonofthebiasedestimatorg¯(r)withtheunbi-
Monte Carlo (MC) type, have been proposed for the three- asedoneg(r)withadaptedintensityλ (r).Thedatausedinthe
S
dimensionalcase (e.g.,see Hamilton 1993;Landy&Szalay simulationarethesameofFig.1.h=0.05.
1993; Quashnock&Sein 1999; Pons-Border´ıaetal. 1999; b
Kerscher 1999;Kerscheretal. 2000;Martinez&Saar 2002,
and references therein). Their conversion to the one- randomsamplesmuchlargerthanthenumberofexperimental
dimensional case is straightforward. However, the fact to pointswithobviouscomputationalcosts(seeMartinez&Saar
work in the one-dimensionalcase allows us to follow the ap- 2002).
proach due to Stoyan&Stoyan (2000) which is a modifica- Westartfromtheunbiasedrigidmotioncorrectedestima-
tion of the MC methods presented in Hamilton (1993) and torsofλ2K(r)andλ2ξ(r):
Landy&Szalay (1993).Inpractice,theauxiliaryrandomsam-
plesofpointsusedinthosepapersarereplacedbygeometrical n n
λ2K(r)= W p 1I (x x ), (9)
termsthatresultfromexactintegration.Thispermitstoobtain | | −ij r | i− j|
more precise results since the quantities of interest are calcu- b Xi=1 j=X1,j,i
lated and not only estimated as in the case of the MC tech-
wheretheweights p aregivenby
ij
niques.Moreover,thereisaremarkablebenefitinthecompu-
tational efficiency. In fact, in order to provide satisfactory re- W W
sults,theMCmethodsrequiretheuseofanumberofauxiliary pij = | ∩ xi−xj|. (10)
W
| |
4 R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction
W istheobservingwindowshiftedbytheamounts.IfW isan really present, a higher degree of clustering than true may be
s
interval,itissimply W W = W x x .Similarly: obtained. These effects are clearly visible in Figs. 3-4 where
| ∩ xi−xj| | |−| i− j| the performanceof the estimator K¯(r) is shown for a Poisson
n n
Wc(r) process, typically of complete random spatial patterns, and a
λ2ξ(r)= | |2 p−ij1kh(r−|xi−xj|)−λ2. (11) Cox one that models spatial patterns with significant density
Xi=1 j=X1,j,i
b fluctuations(Moller&Waagepetersen 2004).
Here, c(r) = 2h/(r+h max[0,r h]) is a correctingfactor In presence of gapsthe unbiasedestimators K(r) and ξ(r)
− −
forthevaluesofrforwhichr hgoesnegative(Guan 2006). andtheintensitiesλ (r)andλ (r)canstillbeused.Here,how-
− V S b b
A“naive”methodusedbymanyauthorstoobtainestimatesof ever,acomputationalproblemarises.Infact,althoughthecom-
K(r)andξ(r)istosetλ2 =n2/W2andthentodividetheesti- putation of the quantities W W , λ (r) and λ (r) needs
mators(9)and(11)bythisquan|tity|.However,thecircumstance only elementary mathema|tics∩, nevxei−rxthj|eleSssit is impVossible to
that n/W is a good estimator of λ does not necessarily im- giveexplicitformulas.Forsmallr, a simple approximationis
| |
pliesthatthesameholdsforn2/W2withrespecttoλ2.Infact, givenby
| |
more is possible. As Hamilton (1993) and Landy&Szalay
(1993)showedforξ(r)andStoyan&Stoyan (2000)forK(r), W W = L (n +1)r, (15)
r w
itispossibletoreducethemeansquarederror(MSE)1 if,in- | ∩ | −
steadofthenaiveclassicalestimator,adaptedestimatorsforλ whereListhedistancebetweenthemostleftobservationinter-
are used.One particularityof these estimatorsis thattheyare val endpoint and the most right observation interval endpoint
differentfor K(r)andξ(r)anddependonthedistancer.They andnw isthenumberofgaps.Forlargerr,theexactalgorithm
are denotedby λ (r), for ξ(r), and λ (r), for K(r). The index presentedinAppendixAcanbeused.Asanalternative,theap-
S V
“S”referstosurfaceand“V”tovolume(formoredetails,see proximatedbutefficientalgorithmbasedonaFouriertechnique
Stoyan&Stoyan 2000).Theexpressionforthetwoestimators inAppendixBisproposed.Afterthat,itispossibletocompute
isgivenby λ (V)andthenξ(r).Thecomputationofλ (r)ismorecompli-
S V
catedandcanbecarriedoutbynumericalintegration.However
λ (r)= n 1W(xi−r)+1W(xi+r), (12) amoreefficientbapproach,basedagainonaFouriertechnique,
S
2W W isdescribedinAppendicesB-C.
Xi=1 | ∩ r|
TheexcellentperformanceoftheestimatorK(r)combined
and
withtheintensityλ (r)isshowninFigs.5-6(tocomparewith
V
b
n W [x r,x +r] Figs. 3-4). Figures. 7-8 show the results obtainable with ξ(r)
λV(r)=Xi=1 | 2∩0r|Wi−∩Wit|dt |, (13) cuolamtiboinnseidswofiththeλSs(arm).eTohredenruomfbtheratotfyppoicianltlsyuesxepdeicntetdhifsorsbitmhe-
R
Ly-αlinesinasinglespectrum.Inordertocheckiftheseresults
with
areconsistentwithwhatexpectedfromthetheory,inFig.9the
1w(x)=(01 oifthxe∈rwWis,e. (14) tfhoerotrheetiecxapleMriSmEeξn(tr)woitfhththeeξ(Pro)iesssotinmpatroorcecsosmubsiendedfowriFthigλ.S7(ris)
b
comparedwiththatestimatedonthebasisof5000simulations;
Figures1-2showtheremarkableimprovementintheestimate
apartfromtheverysmallvaluesofr,wheretheeffectsdueto
ofbothK(r)andξ(r).
thebandwidthharemoreimportant,theagreementis reason-
ablygood.Worthofnoteisthenon-monotonicbehaviorofthe
3. Estimationof K(r)andξ(r)inpresenceofgaps twoMSEsthatisdirectlylinkedtothefactthattheobserving
window W is constituted by disjoint segments. The equation
Inpracticalapplications,theedgeeffectsarenottheonlyprob-
usedforMSE (r),
lem.Infact,oftentheobservingwindowisaunionofintervals ξ
separated by gaps. In the spectra of quasars this is due to the
ξ(r)+1
presenceofmetallinesandstrongLy-αsystemsthatcanhid- MSEξ(r)= 2hλ2W W , (16)
den portions of the Ly-α forest. Ignoring these gaps and tak- | ∩ r|
ingerroneouslytheintervalstartingwiththemostleftinterval isduetoStoyanetal. (1993).
start-point to the most right interval endpoint as the observ-
ingwindowproducesbiasedestimates.Clearly,theestimators
willtendtoreportmoreclusteringthanreallyexisting.Therea- 4. Apracticalapplication
son is simply that when the pointswithin some given regions
In order to apply the discussed method and verify its reli-
are removed, then this artificially creates separated groups of
ability, we have used the best high resolution high signal-
pointsthatmimic the presenceof clusters. Asa consequence,
to-noise quasar spectra publicly available observed with the
an erroneous indication of clustering can happen even in the
UVESspectrograph(Dekkeretal.2000)attheVLTtelescope
case of no-interaction or complete spatial randomness in the
in the context of the ESO Large Programme “The Cosmic
pointpattern.Ontheotherhand,ifclusteringoraggregationis
EvolutionoftheIGM”(Bergeronetal.2004).Inparticular,we
1 Themeansquared errorofanestimatorξ(x)ofξ(r)isgivenby have chosen two examples of Ly-α forest contaminated by a
MSE (r)=E[(ξ(r) ξ(r))2].AsimilarexpressionholdsforMSE (r). largenumberofmetaltransitions:PKS0237-23atz = 2.233
ξ − e K em
e e e
R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction 5
andPKS2126-158atz = 3.267,forwhichthe lineblanket- ormorelistsoflineshavetobeanalyzedatthesametime(see
em
ingduetothesameLy-αlinesismoresevereastheemission below).
redshiftislarger.
The quasar spectra have a resolution R 45000 and a 5. Multi-listanalysis
≃
signal-to-noise ratio in the Ly-α forests S/N 80 per pixel.
They have been reduced with the pipeline of≃the instrument Here,webrieflyillustrateanadditionalbenefitinusingthees-
(version 2.1, Ballesteretal. 2000) provided by ESO in the timator ξ(r). In particular, the fact that it can be easily gen-
contextofthedatareductionpackageMIDAS.Thecontinuum eralized for the estimation of a two-pointcorrelationfunction
b
level was determined with a manual subjective method based whentwoormorelistsoflinesareavailable.Themainproblem
on the selection of the regionsfree from clear absorptionthat is that, in anexperimentalcontext,the lists canhave different
aresuccessivelyfittedwithasplinepolynomialof3rddegree. characteristics concerningthe coveredspatial intervalsand/or
Thenormalizedspectraaretheninspectedtoidentifymetalab- thetotalextensionofthegaps.Hence,thenaiveestimator
sorption systems with a particular attention to the transitions
L
1
that fall in the Ly-α forest. Once the forest has been cleaned, Ξ(r)= ξ(r), (17)
l
theHlinesarefittedwithVoigtprofilesintheLYMANcontext LXl=1
oftheMIDASreductionpackage.Inordertoincreasetherelia- b
bilityofthefitparametersforthesaturatedLy-αlinesweused with ξl(r) the estimate obtained from the lth data set, can
alsotheotherunblendedlinesoftheLymanseriesfallinginthe provide unsatisfactory results since all the lists have the
b
observed spectral range. We excluded 1000 km s 1 from the same importance in forming the final result. Some kind of
−
Ly-βemissionand5000kms 1 fromtheLy-αemissionofthe weighting is necessary. In this respect, it has been proved
−
quasar to avoid associated absorbers and the proximity effect (Ohser&Mu¨ecklich 2000)thattheestimator
duetothequasaritself.Furthermore,wecutoutofthespectra
L W W ξ(r)
the regions occupied by the metal transitions and the Lyman Ξ(r)= l=1| ∩ r|l l (18)
lcimmit2)sywshteicmhsc(oLuyl-dαmliansekstwheitphrecsoelunmcenodfewnesiatkyeNrl(iHneI)si≥nt1h0e1i7r b PPlL=1|W∩Wbr|l
− provides an estimate of ξ(r) with a smaller variance with re-
velocity profiles (line blanketing). At the smallest scales the
specttoΞ(r)foreachvalueofr.Withthenumericaltechniques
clusteringpropertiesareaffectedbythetypicalvelocitywidth
described in Sects. B-D, the implementationof the algorithm
of the lines( 25 30km s 1, e.g.Kim et al. 2001),for this
− for the computationof Ξ(r) is trivial. However,the character-
∼ −
reasonwehavemergedthelinepairscloserthan25kms 1.
− istics and the performances of this estimator are beyond the
Figures 10-13 show K(∆r) and ξ(∆r) as a function of co- scopeofthepresentpapber.
moving spatial separation, ∆r, for the Ly-α forest data of
b b
the objects PKS0237-23 and PKS2126-158. For PKS0237-
6. Conclusions
23 the number of Ly-α lines is 191 and the total length of
gaps amounts to about 16% of the interval spanned by W, In this work we have analyzedpossible sourcesof bias in the
whereas for PKS2126-158these quantities are 300 and 20%, estimationofthesecond-ordercharacteristicsoftheLy-αfor-
respectively.For reference,the correspondingfunctionsedge- est. In particular, the metallic and the strong Ly-α lines have
corrected but evaluated without considering the gaps are also been considered that can hidden part of the lines of interest.
displayed.Inboththecases,itisevidentthestrongerindication This may result in a severe bias of both the K-function and
ofclusteringifthegapsarenottakenintoaccount.Forthesame the two-point correlation function. Specifically, an erroneous
experiment, Figs. 14-15 show the comparison between ξ(∆r) indication of clustering can happen even in the case of no-
andtheestimateofthetwo-pointcorrelationfunctionobtained interactionorcompletespatialrandomnessinthe linepattern
b
with the MC method by Landy&Szalay (1993). In this last or, if a clusteringis really present,a higherdegreeof cluster-
case a numberof ND = 50datasets havebeengeneratedthat ingthantruecanbeobtained.Inordertohandlethisproblem,
containanumberofrandomsamplesequaltothatoftheorigi- we have proposed a computational efficient method that has
nalones.NDhasbeenchosenlargeenoughthatthemeanofthe beenimplementedintwodifferentalgorithms;onethatisable
estimatedtwo-pointcorrelationfunctionsdoesnotchangesig- toprovideexactresultsandtheotheronethatisapproximated
nificantlywhenmoredatasetsareadded.Thegoodagreement but very fast. Its excellent performances have been validated
is evident. However,if the computationof ξ(∆r) takes a total throughnumericalsimulationsandanapplicationtorealdata.
of1-2secondsofCPUtime2,theMCmethodrequiresasimi- Wehavealsopresentedanextensionofthemethodfortheesti-
b
laramountoftimeforeachrandomsample.Now,ifonetakes mationofthetwo-pointcorrelationfunctioninthecaseoftwo
intoaccountthatusuallythebandwidthhhastobedetermined ormorelistsoflinesthathavetobeanalyzedatthesametime.
byatrailanderrorapproach,alsowiththelimitedsizeofthe
datasetshereused,thecomputationaladvantageofthemethod
presentedinthispaperisobvious.Thisisespeciallytrueiftwo
2 ExperimentshavebeenconductedwithMatlab6onaPentiumIV
-1500GHzprocessorinaWindows2000operatingsystem.
6 R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction
K−function / biased estimator K−function / unbiased estimator
0.2 0.2
0.18 0.18
0.16 0.16
0.14 0.14
0.12 0.12
K(r) 0.1 K(r) 0.1
0.08 0.08
0.06 0.06
0.04 0.04
True True
0.02 Biased 0.02 Unbiased
16% CI 16% CI
84% CI 84% CI
0 0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
r r
Fig.3. Mean and 68% (i.e., 1-σ) confidence interval for the Fig.5.AsinFig.3butfortheunbiasedestimatorK(r)andthe
biasedestimator K¯(r),with periodicboundaryconditions,ob- adaptedintensityλ (r).
V
b
tained from 100 realizations of a Poisson process with gaps.
Thedata sets hasbeensimulatedby drawing200pointsfrom K−function / unbiased estimator
a Poisson process in the interval [0,1] and discharging those
that are in subset [0.20,0.25] [0.30,0.35] [0.50,0.55]
∪ ∪ ∪ 0.2
[0.70,0.75] [0.80,0.85].ForreferencealsotheK-functionof
∪
thePoissonprocessisshown(“true”line).Becauseoftheim-
posedperiodboundaryconditions,estimatorK¯(r)isfreefrom
theedgeeffects. 0.15
K−function / biased estimator K(r)
0.1
0.2
0.05
True
Poisson
0.15 Unbiased
16% CI
84% CI
0
K(r) 0 0.01 0.02 0.03 0.04 0.r05 0.06 0.07 0.08 0.09 0.1
0.1
Fig.6.AsinFig.4butfortheunbiasedestimatorK(r)andthe
adaptedintensityλ (r).
V
b
0.05
True
Poisson
Biased
16% CI
84% CI
0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
r
Fig.4. The same as in Fig. 3 butfor a Cox processsimulated
in the interval[0,1]by takingthe absolute valueof a station-
ary Gaussian random field with a Gaussian correlation func-
tion whose dispersion is set to 0.01. The Gaussian random
fields have been generated under the hypothesis of periodic
boundaryconditions.Because ofthis,the linelabeled“True”,
that providesthe mean value of the estimator K¯(r) with peri-
odic boundary conditions, is free from the edge effects. This
meanhasbeencomputedonthebasisof5000simulationswith
W [0,1]. For reference, also the K-function of a Poisson
≡
processisshown.
R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction 7
Two−point correlation function x 10−3
5
True
0.25 Biased
Unbiased
16% CI 4.5
84% CI
0.2
4
0.15
3.5
0.1
ξ(r) MSE(r)
0.05 3
0 2.5
−0.05
2
−0.1 Estimated
Theoretical
1.5
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
r
r
Fig.7.AsinFig.3butfortheunbiasedestimatorξ(r)andthe
Fig.9. Estimated MSE (r) (calculated for eleven values of r)
ξ
adaptedintensityλ (r).h=0.01.
S b vs.thetheoreticalonefortheexperimentinFig.7.Noticethe
non-monotonic behavior of the two functions that is directly
Two−point correlation function
linked to the fact that the observing window W is constituted
True
Biased bydisjointsegments.
Unbiased
16% CI
0.8 84% CI
0.6
0.4
ξ(r)
0.2
0
−0.2
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
r
Fig.8.AsinFig.4butfortheunbiasedestimatorξ(r)andthe
adaptedintensityλ (r).h=0.005.
S
b
8 R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction
PKS0237 PKS2126
15
20
15
10
∆r) ∆r)
K( K(
10
5
5
Unbiased Unbiased
Biased Biased
0 0
0 2 4 6 8 10 0 1 2 3 4 5 6 7
∆r (comoving Mpc) ∆r (comoving Mpc)
Fig.10.K(∆r)forthedataofPKS0237-23whenthegapsdue Fig.12.K(∆r)forthedataofPKS2126-158whenthegapsdue
to the metal and the strong Ly-α lines are taken into account to the metal and the strong Ly-α lines are taken into account
b b
(unbiasedestimator)asafunctionofthecomovingspatialsep- (unbiasedestimator)asafunctionofthecomovingspatialsep-
aration,∆r.Thelargestvalueof∆rinthefigurecorrespondsto aration,∆r.Thelargestvalueof∆rinthefigurecorrespondsto
about5%theintervalspannedbythisquantity.Thetotalexten- about5%theintervalspannedbythisquantity.Thetotalexten-
sion of gaps amountto about16% of spatial intervalcovered sion of gaps amountto about20% of spatial interval covered
byW.Forreference,thesameestimateisshownwhengapsare by W. For reference, the same estimator is shown when gaps
neglected(biasedestimator). areneglected(biasedestimator).
PKS0237 PKS2126
0.25
Unbiased Unbiased
0.6 Biased Biased
0.2
0.5
0.4 0.15
0.3
0.1
ξ∆(r) 0.2 ξ∆(r) 0.05
0.1
0
0
−0.1 −0.05
−0.2
−0.1
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7
∆r (comoving Mpc) ∆r (comoving Mpc)
Fig.11.ξ(∆r) for the data of PKS0237-23whenthe gapsdue Fig.13.ξ(∆r)forthedataofPKS2126-158whenthegapsdue
to the metal and the strong Ly-α lines are taken into account to the metal and the strong Ly-α lines are taken into account
(unbiasebdestimator)asafunctionofthecomovingspatialsep- (unbiasebdestimator)asafunctionofthecomovingspatialsep-
aration,∆r.Thelargestvalueof∆rinthefigurecorrespondsto aration,∆r. Thetotalextensionofgapsamountto about20%
about5%theintervalspannedbythisquantity.Thetotalexten- of spatial interval covered by W and h = 0.001 in the same
sion of gaps amountto about16% of spatial intervalcovered units. For reference, the same estimator is shown when gaps
byW andh= 0.001inthesameunits.Forreference,thesame areneglected(biasedestimator).
estimateisshownwhengapsareneglected(biasedestimator).
R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction 9
PKS0237
0.5
Unbiased
Unbiased MC
0.4
0.3
0.2
ξ∆(r) 0.1
0
−0.1
−0.2
1 2 3 4 5 6 7 8 9 10 11
∆r (comoving Mpc)
Fig.14.ξ(∆r) for the data of PKS0237-23whenthe gapsdue
to the metal and the strong Ly-α lines are taken into account
b
(unbiasedestimator)asafunctionofthecomovingspatialsep-
aration,∆r.Thelargestvalueof∆rinthefigurecorrespondsto
about5%theintervalspannedbythisquantity.Thetotalexten-
sion of gaps amountto about16% of spatial intervalcovered
byW andh=0.001inthesameunits.Forreference,theresult
providedbytheMCmethodbyLandy&Szalay (1993)isalso
shown(unbiasedMCestimator).
PKS2126
0.15
0.1
0.05
∆r) 0
ξ(
−0.05
−0.1
Unbiased
Unbiased MC
1 2 3 4 5 6 7
∆r (comoving Mpc)
Fig.15.ξ(∆r)forthedataofPKS2126-158whenthegapsdue
to the metal and the strong Ly-α lines are taken into account
b
(unbiasedestimator)asafunctionofthecomovingspatialsep-
aration,∆r.Thelargestvalueof∆rinthefigurecorrespondsto
about5%theintervalspannedbythisquantity.Thetotalexten-
sion of gaps amountto about16% of spatial intervalcovered
byW andh=0.001inthesameunits.Forreference,theresult
providedbytheMCmethodbyLandy&Szalay (1993)isalso
shown(unbiasedMCestimator).
10 R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction
x x s= s+z2 y1;
min max −
end
a b a b |W| = 0.8 i2=i2+1;
1 1 2 2
z1=b[i2 1];
−
z2=a[i2];
end
|W| = 0.65 elseifz1<y2then
r
< r > ifz1>y1then
s= s+y2 z1;
−
else
|W ∩ W| = 0.45 s= s+y2 y1;
r −
end
i1=i1+1;
y1=b[i1 1]+r;
−
← OBSERVING INTERVAL → ifi1 nk+1then
≤
y2=a[i1]+r;
Fig.A.1. Example of an observing window W with two gaps, ify2> x then
max
itsshiftedversionW andtheintersectionW W .Thewindow y2= x ;
r ∩ r max
lengthsaregiveninunitsoftheobservingintervalx x . end
max min
−
r=0.15inthesameunits. end
end
AppendixA: Anexactmethodtocompute W W end
r
| ∩ | end
Thefollowingpresentsapseudo-codethatimplementsanexact returns
algorithm for the computation of W W for a window W
r
| ∩ |
with gaps. It is assumed that the observing window W lies in
AppendixB: An approximatefastmethodto
theinterval[x ,x ]andhasn gaps[a ,b ]fork=1,...,n
min max k k k k
compute W W
with r
| ∩ |
Themethodpresentedintheprevioussectionprovidesanexact
x =b <a <b <...<a <b <a = x . (A.1)
min 0 1 1 nk nk nk+1 max valueof W W .Ifthecomputingtimeisofconcern,amore
r
| ∩ |
Fig.A.1showsthesituationforthecasen =2. efficient,althoughapproximatedmethod,ispossible.Theidea
k
The algorithm considers all combinations of the intervals istoconsidertheobservingwindowWasafunction (x)inR
W
betweenthegaps,determinesthelengthsoftheirintersections definedin the domain( ,+ ).When x [x ,x ], then
min max
−∞ ∞ ∈
andsumsup.Thewantedlength W W isdenotedbys. (x) = 0 if x belongs to a gap, (x) = 1, otherwise. If
r
| ∩ | W W
%input: x<[x ,x ],itis (x)=0.
min max
W
%r interpointdistance Itisnotdifficulttorealizethat,forafixedr ,γ(r ) W
∗ ∗
−→ ≡| ∩
% W isgivenby
r
∗|
+
%initializationofindices: γ(r )= ∞ (x) (x+r )dx. (B.1)
∗ ∗
Z W W
s=0;
−∞
i1=1; Whenr∗ismadetochangeintheinterval( ,+ ),thisequa-
−∞ ∞
i2=1; tionprovidestheautocorrelationfunctionof (x).Hence,
W
γ(r)=IFT FT[ (x)] (FT[ (x)]) . (B.2)
′
%fixtheshiftedinterval: W × W
(cid:8) (cid:9)
y1=b[i1 1]+r; Thesymbol“ ”meanscomplexconjugation,whereasFTand
′
−
y2=a[i1]+r; IFTdenotetheFouriertransformandtheinverseFouriertrans-
ify2> x theny2= x ;end form,respectively.
max max
%fixtheoriginalinterval: AnefficientimplementationofEq.(B.2)requirestheinter-
z1=b[i2 1]; val [x ,x ] be discretized in a set of N bins. In the case
min max b
−
z2=a[i2]; ofthe estimatorξ(r),thisdoesnotrepresentatruelimitsince
it is a function that is computed on spatial bins of length 2h
b
%computethelengthoftheintersections (seeEq.(11)).Thevalueof N touseinthediscretizationde-
b
while(i1 n +1)and(y1< x )do pends on factors as the extension of the gaps, the number of
k max
≤
ifz2<y2then pointsandtherangeofdistancer ofinterest(seealsobelow).
ifz2>y1then Inanycase,thisisnotacriticalquantity.Inthenumericalex-
ifz1>y1then perimentspresentedintheprevioussections,N =5000issuf-
b
s= s+z2 z1; ficienttoobtainresultsalmostindistinguishablefromthoseob-
−
else tainable with an exact approach. However, even values of N
b