Table Of Content

Astronomy&Astrophysicsmanuscriptno.point c ESO2008 (cid:13) February5,2008 Ly-α forest: efficient unbiased estimation of second-order properties with missing data 7 0 R.Vio1,V.D’Odorico2,H.Stoyan3,andD.Stoyan3 0 2 1 ChipComputersConsultings.r.l.,VialeDonL.Sturzo82,S.LiberalediMarcon,I-30020Venice,Italy n a e-mail:[email protected] J 2 INAF-OsservatorioAstronomicodiTrieste,viaG.B.Tiepolo11,I-34143Trieste,Italy 1 e-mail:[email protected] 3 3 InstituteofStochastics,TUBergakademieFreiberg D-09596Freiberg,Germany 1 e-mail:[email protected] v 5 Received.............;accepted................ 9 8 1 ABSTRACT 0 7 Context.One important step in the statistical analysis of the Ly-αforest data is the study of their second order properties. Usually, this is 0 accomplished by means of the two-point correlation function or, alternatively, the K-function. In the computation of these functions it is / necessarytotakeintoaccountthepresenceofstrongmetallinecomplexesandstrongLy-αlinesthatcanhiddenpartoftheLy-αforestand h representanonnegligiblesourceofbias. p Aims.In this work, we show quantitatively what are the effects of the gaps introduced in the spectrum by the strong lines if they are not - o properlyaccountedforinthecomputationofthecorrelationproperties.Weproposeageometricmethodwhichisabletosolvethisproblem r and iscomputationally moreefficient thantheMonte Carlo(MC) technique that istypically adopted inCosmology studies. Themethod is t s implementedintwodifferentalgorithms.Thefirstonepermitstoobtainexactresults,whereasthesecondoneprovidesapproximatedresults a butiscomputationallyveryefficient.Theproposedapproachcanbeeasilyextendedtodealwiththecaseoftwoormorelistsoflinesthathave : v tobeanalyzedatthesametime. i Methods.Numericalexperimentsarepresentedthatillustratetheconsequencestoneglecttheeffectsduetothestronglinesandtheexcellent X performancesoftheproposedapproach. r Results.Theproposedmethodisabletoremarkablyimprovetheestimatesofboththetwo-pointcorrelationfunctionandtheK-function. a Keywords.Methods:dataanalysis–Methods:statistical–Quasars:absorptionlines–Cosmology:large-scalestructureofUniverse 1. Introduction 1997; Zhangetal. 1997; Theuns,Leonard&Efstathiou 1998; Machaceketal.2000).Thus,theLy-αforestprovidesaunique Thestudyoftheabsorptionlinesobservedintheopticalspectra and powerful tool to study the distribution/evolutionof bary- ofhighredshiftquasarshasgreatlyevolvedin thelastdecade onic matter and the physical status of the IGM over a wide thanksmainlytotheadventofthenewclassof10m-telescopes redshiftinterval. coupledwithhighresolution,highperformancespectrographs and to the improvedcomprehensionof the physical nature of theabsorbers. Inanalogywiththestudyofgalaxies,thetwo-pointcorrela- The lines forming the so called “Ly-α forest” observed tionfunctionhasbeenclassicallyusedtostudythedistribution blue ward of the Ly-α emission of the quasar are due to of the Ly-α lines (e.g. Sargentetal. 1980). However, the Ly- the Ly-α transition in neutral hydrogen distributed along the αdistributioniscontaminatedintheobservedspectrabymetal line of sight. They originate in the fluctuations of the inter- transitionsassociatedwithgalactichaloespiercedbythelineof mediate and low density intergalactic medium (IGM), aris- sight.Thestrongermetallinescanextendforseveralangstrom ingnaturallyinthehierarchicalprocessofstructureformation coveringweakerLy-αlinesandcausingabiasintheirdistribu- (e.g.,Cenetal.1994;Zhangetal.1995;Hernquistetal.1996; tion.ThesameistrueforstrongLy-αabsorptions,inparticular, Miralda-Escudeétal. 1996; Bi&Davidsen 1997; Daveétal. LymanlimitanddampedLy-αsystems.Ontheotherhand,the distribution at small scales is biased by the intrinsic width of Sendoffprintrequeststo:R.Vio thelines( 25 30kms 1). − ∼ − 2 R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction Observers have become more and more aware of those 2. Estimationof K(r)andξ(r)inpresenceofedge problemsin particularforhighresolutionspectra and atlarge effects redshifts where the crowding of the lines increases (dn/dz ∝ A one-dimensional spatial point process X is any stochastic (1 + z)2.5, e.g., Kimetal. 2002). In the most recent papers mechanismthatgeneratesacountablesetofeventsx = x n based on high resolution quasar spectra (e.g., Luetal. 1996; ontherealaxisR.Itiscalledstationaryifthedistribution{ oi}fi=X1 Cristianietal.1997;Kimetal.2001)thestrongLy-αlinesare isinvariantundertranslations,thatis,thedistributionofX+s eliminated from the computation of the two-point correlation isthesameasthatofXforanys R.Usually,stationarypoint functionandthelinescloserthanaminimumvelocitysepara- ∈ processes are described by their first-order and second-order tion are merged.In orderto correctforedge effects and pres- characteristics.Thefirst-ordercharacteristic,sayλ,equalsthe ence of the metal gaps in the line distribution, the observed mean number of points per unit of length and is often called distributionof linepairsiscomparedwith MonteCarlo(MC) intensity. The second-order characteristics are represented by simulationsoflineswhicharerandomlydistributedintheob- thetwo-pointcorrelationfunctionξ(r)andthepaircorrelation served redshift interval other than the cosmological increase functiong(r)thatsatisfy withredshift.Thesamegapsduetothecut-outabsorptionsare reportedalsoonthesimulatedspectraandtheminimumsepa- g(r)=1+ξ(r). (1) rationbetweentwolinesispreserved.Thesimulatedprocessis approximatelyofPoisson typeoversmallredshiftseparations Thefunctiong(r)isdefinedasfollows.Consideranyinfinitesi- (seeSargentetal.1980). malintervalBoflengthdL.Theprobabilityofhavingapointin BisλdL.ConsidernowtwosuchintervalsB andB oflengths 1 2 dL anddL andintercenterdistancer.Theprobabilitytohave A further approximation is introduced in this method be- 1 2 apointineachintervalcanbedenotedbyP(r)andisexpressed cause it is not the redshift split but the velocity split distri- as bution that is considered, where the separation ∆v between ij two absorptionlines of redshiftszi and zj is givenby the non P(r)=g(r)λ2dL dL . (2) 1 2 linear formula ∆v = c(z z )/[1 + (z + z )/2]. Since, if ij i j i j | − | ∆v > ∆v ,∆v ,itis∆v , ∆v +∆v ,thequantities∆v The factor of proportionalityg(r) is the pair correlationfunc- 13 12 23 13 12 23 ij do not represent Euclidean distances. Hence, from the quan- tion. It is clear that, in the case of complete randomness, tities ∆v N it is not even possible to construct the one- g(r) = 1 or ξ(r) = 0, independently of r. g(r) > 1, or { i,j}i,j=1 dimensional point process corresponding to the sequence of ξ(r) > 0,indicatesthatpairsofpointswithdistanceinthein- the velocities v ,v ,...,v . The use of velocities at the place terval [r dr/2,r + dr/2] are more likely to occur than for 1 2 N − of comoving separations was suggested mainly by the un- a Poissonprocesswith the same intensity,thatmeansthere is known ratio between peculiar velocity and Hubble flow con- clustering.Onthecontrary,g(r) < 1orξ(r) < 0correspondto tributionstothemeasuredLy-αlineredshifts.Thereisnowev- inhibitionorregularity. idencethatpeculiarvelocitiesarenegligibleforthoseabsorbers The estimation of g(r) and ξ(r) presents a practical diffi- (e.g.,Rauchetal.2005;D’Odoricoetal.2006),whichmakesit culty. In fact, for a given distance r, the estimation of these preferabletousecomovingdistancestocomputethetwo-point characteristicsrequiresevaluationofthenumberofpointpairs correlationfunction. intheinfinitesimalinterval[r dr/2,r+dr/2].Sincethisisim- − possible,afiniteinterval[r h,r+h]hastobeused.Thisleads − Thecomputationofacomovingdistancecorrespondingto to approximation errors, especially for small r when it could agivenzrequirestheevaluationofanintegralthatingeneralis happenthatr h<0,andtotheproblemofchoosingtheband- − notavailableinclosedform.Hence,anumericalapproachhas width h. In particular, the determination of the best value of tobeadopted.ThiscanmaketheMCmethodsverytimecon- htousefora specificproblemisstillan”art”(Mart´ınezetal. suming. For this reason, in this paper we presenta geometric 2005). The best recipe is to consider a series of bandwidths methodtocomputethetwo-pointcorrelationthatdoesnotre- starting fromh 1/λandthen to comparethe results;some- ≈ quirethegenerationofauxiliaryrandomsamplesandtherefore times it is recommendable to use “adapted” bandwidths, i.e. ismuchmoreefficient. differentvaluesofhforsmallandlarger.Inordertoavoidthis kindofproblem,sometimesstatisticiansusealsotheso-called K-function,K(r),thatisrelatedtoξ(r)through InSec.2thecomputationofthetwo-pointcorrelationfunc- tionandoftheK-functionisconsideredwhenonlyedgeeffects r K(r)=2 [1+ξ(u)]du, (3) are present. In Sec. 3 andAppendicesA-D the argumentsare Z 0 generalizedto dealwiththecase ofgapsinthedatasequence or and two algorithmsare proposed.Two experimentaldata sets are analyzed in Sec. 4. Finally, conclusions are presented in 1dK(r) ξ(r)= 1. (4) Sec.6. 2 dr − Here, no bandwidth has to be fixed. The main drawback of Throughoutthis paperwe will adoptfor the cosmological this approach is that K(r) is a cumulative function,which in- parameters: Ω = 0.26, Ω = 0.74, and H = 73 km s 1 creasesnearlylinearly.(Acomparablesituationexistsinclas- 0m 0Λ 0 − Mpc 1. sical statistics for the pair “probability distribution function” − R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction 3 F(x)andthe“probabilitydensityfunction” f(x).)Forthisrea- K−function 1 son, a plot of K(r) is more difficult to interpret than a plot of ξ(r). As a consequence, ξ(r) is more useful in exploratory 0.9 statistics, whereas K(r)isusefulintestingstatisticalhypothe- 0.8 ses, e.g., that a given point sequence is “completely random” orbelongstoaPoissonprocess. 0.7 IfapointprocesswasobservableovertheentireRdomain, 0.6 the estimationof K(r) andξ(r) couldsimplyfollowtheirdef- initions. But in general, a pointprocesscan be observedonly K(r)0.5 inafiniteregionW definedintheinterval[x ,x ] R,the min max ⊂ 0.4 observing window. Then edge effects play an important role, sincetheestimationof K(r)andξ(r)shouldrequiretheuseof 0.3 pointsexternaltoW.Hence,the“natural”estimator 0.2 n n W K¯(r)= | | I (x x ), (5) 0.1 True n2 r | i− j| Unbiased Xi=1 j=X1,j,i Biased 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 with W thelengthofW and r | | 1 ifu r Fig.1.ComparisonofthebiasedestimatorK¯(r)withtheunbi- Ir(u)=(0 othe≤rwise, (6) asedoneK(r)withadaptedintensityλV(r).Thedatausedinthe simulationconsistof200pointsdrawnfromaPoissonprocess will yield results too small, with a negative bias. The same b intheinterval[0,1]. holdsforthe“natural”estimatorofξ(r) W2 n n ξ¯(r)= | | k (r x x ) 1, (7) Two−point correlation function 2n2 h −| i− j| − Xi=1 j=X1,j,i 0.1 where 1/2h if h u h 0 kh(u)=(0 oth−erw≤ise.≤ (8) Figures1-2 showthatthementionedbiasesare reallysig- −0.1 nificant also in the one-dimensionalcase considered here. As expected, the larger r the larger the biases. This is due to the ξ(r)−0.2 increasingnumberofpointsexternaltoW that,forincreasing valuesofr,shouldbenecessarytoestimatebothK(r)andξ(r). −0.3 AlthoughnotclearlyvisibleinFig.2,ξ¯(r)presentsabiasalso whenr<h.Thisisbecauseinthecomputationofξ(r)onlythe pointsintheinterval[0,r+h]oflengthshorterthan2hcanbe −0.4 True considered. Unbiased Biased In Astronomy, especially in the field of the statisti- −0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 cal analysis of the systems of galaxies, this edge-correction r problem is well known. Various algorithms, mostly of Fig.2.Comparisonofthebiasedestimatorg¯(r)withtheunbi- Monte Carlo (MC) type, have been proposed for the three- asedoneg(r)withadaptedintensityλ (r).Thedatausedinthe S dimensionalcase (e.g.,see Hamilton 1993;Landy&Szalay simulationarethesameofFig.1.h=0.05. 1993; Quashnock&Sein 1999; Pons-Border´ıaetal. 1999; b Kerscher 1999;Kerscheretal. 2000;Martinez&Saar 2002, and references therein). Their conversion to the one- randomsamplesmuchlargerthanthenumberofexperimental dimensional case is straightforward. However, the fact to pointswithobviouscomputationalcosts(seeMartinez&Saar work in the one-dimensionalcase allows us to follow the ap- 2002). proach due to Stoyan&Stoyan (2000) which is a modifica- Westartfromtheunbiasedrigidmotioncorrectedestima- tion of the MC methods presented in Hamilton (1993) and torsofλ2K(r)andλ2ξ(r): Landy&Szalay (1993).Inpractice,theauxiliaryrandomsam- plesofpointsusedinthosepapersarereplacedbygeometrical n n λ2K(r)= W p 1I (x x ), (9) termsthatresultfromexactintegration.Thispermitstoobtain | | −ij r | i− j| more precise results since the quantities of interest are calcu- b Xi=1 j=X1,j,i lated and not only estimated as in the case of the MC tech- wheretheweights p aregivenby ij niques.Moreover,thereisaremarkablebenefitinthecompu- tational efficiency. In fact, in order to provide satisfactory re- W W sults,theMCmethodsrequiretheuseofanumberofauxiliary pij = | ∩ xi−xj|. (10) W | | 4 R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction W istheobservingwindowshiftedbytheamounts.IfW isan really present, a higher degree of clustering than true may be s interval,itissimply W W = W x x .Similarly: obtained. These effects are clearly visible in Figs. 3-4 where | ∩ xi−xj| | |−| i− j| the performanceof the estimator K¯(r) is shown for a Poisson n n Wc(r) process, typically of complete random spatial patterns, and a λ2ξ(r)= | |2 p−ij1kh(r−|xi−xj|)−λ2. (11) Cox one that models spatial patterns with significant density Xi=1 j=X1,j,i b fluctuations(Moller&Waagepetersen 2004). Here, c(r) = 2h/(r+h max[0,r h]) is a correctingfactor In presence of gapsthe unbiasedestimators K(r) and ξ(r) − − forthevaluesofrforwhichr hgoesnegative(Guan 2006). andtheintensitiesλ (r)andλ (r)canstillbeused.Here,how- − V S b b A“naive”methodusedbymanyauthorstoobtainestimatesof ever,acomputationalproblemarises.Infact,althoughthecom- K(r)andξ(r)istosetλ2 =n2/W2andthentodividetheesti- putation of the quantities W W , λ (r) and λ (r) needs mators(9)and(11)bythisquan|tity|.However,thecircumstance only elementary mathema|tics∩, nevxei−rxthj|eleSssit is impVossible to that n/W is a good estimator of λ does not necessarily im- giveexplicitformulas.Forsmallr, a simple approximationis | | pliesthatthesameholdsforn2/W2withrespecttoλ2.Infact, givenby | | more is possible. As Hamilton (1993) and Landy&Szalay (1993)showedforξ(r)andStoyan&Stoyan (2000)forK(r), W W = L (n +1)r, (15) r w itispossibletoreducethemeansquarederror(MSE)1 if,in- | ∩ | − steadofthenaiveclassicalestimator,adaptedestimatorsforλ whereListhedistancebetweenthemostleftobservationinter- are used.One particularityof these estimatorsis thattheyare val endpoint and the most right observation interval endpoint differentfor K(r)andξ(r)anddependonthedistancer.They andnw isthenumberofgaps.Forlargerr,theexactalgorithm are denotedby λ (r), for ξ(r), and λ (r), for K(r). The index presentedinAppendixAcanbeused.Asanalternative,theap- S V “S”referstosurfaceand“V”tovolume(formoredetails,see proximatedbutefficientalgorithmbasedonaFouriertechnique Stoyan&Stoyan 2000).Theexpressionforthetwoestimators inAppendixBisproposed.Afterthat,itispossibletocompute isgivenby λ (V)andthenξ(r).Thecomputationofλ (r)ismorecompli- S V catedandcanbecarriedoutbynumericalintegration.However λ (r)= n 1W(xi−r)+1W(xi+r), (12) amoreefficientbapproach,basedagainonaFouriertechnique, S 2W W isdescribedinAppendicesB-C. Xi=1 | ∩ r| TheexcellentperformanceoftheestimatorK(r)combined and withtheintensityλ (r)isshowninFigs.5-6(tocomparewith V b n W [x r,x +r] Figs. 3-4). Figures. 7-8 show the results obtainable with ξ(r) λV(r)=Xi=1 | 2∩0r|Wi−∩Wit|dt |, (13) cuolamtiboinnseidswofiththeλSs(arm).eTohredenruomfbtheratotfyppoicianltlsyuesxepdeicntetdhifsorsbitmhe- R Ly-αlinesinasinglespectrum.Inordertocheckiftheseresults with areconsistentwithwhatexpectedfromthetheory,inFig.9the 1w(x)=(01 oifthxe∈rwWis,e. (14) tfhoerotrheetiecxapleMriSmEeξn(tr)woitfhththeeξ(Pro)iesssotinmpatroorcecsosmubsiendedfowriFthigλ.S7(ris) b comparedwiththatestimatedonthebasisof5000simulations; Figures1-2showtheremarkableimprovementintheestimate apartfromtheverysmallvaluesofr,wheretheeffectsdueto ofbothK(r)andξ(r). thebandwidthharemoreimportant,theagreementis reason- ablygood.Worthofnoteisthenon-monotonicbehaviorofthe 3. Estimationof K(r)andξ(r)inpresenceofgaps twoMSEsthatisdirectlylinkedtothefactthattheobserving window W is constituted by disjoint segments. The equation Inpracticalapplications,theedgeeffectsarenottheonlyprob- usedforMSE (r), lem.Infact,oftentheobservingwindowisaunionofintervals ξ separated by gaps. In the spectra of quasars this is due to the ξ(r)+1 presenceofmetallinesandstrongLy-αsystemsthatcanhid- MSEξ(r)= 2hλ2W W , (16) den portions of the Ly-α forest. Ignoring these gaps and tak- | ∩ r| ingerroneouslytheintervalstartingwiththemostleftinterval isduetoStoyanetal. (1993). start-point to the most right interval endpoint as the observ- ingwindowproducesbiasedestimates.Clearly,theestimators willtendtoreportmoreclusteringthanreallyexisting.Therea- 4. Apracticalapplication son is simply that when the pointswithin some given regions In order to apply the discussed method and verify its reli- are removed, then this artificially creates separated groups of ability, we have used the best high resolution high signal- pointsthatmimic the presenceof clusters. Asa consequence, to-noise quasar spectra publicly available observed with the an erroneous indication of clustering can happen even in the UVESspectrograph(Dekkeretal.2000)attheVLTtelescope case of no-interaction or complete spatial randomness in the in the context of the ESO Large Programme “The Cosmic pointpattern.Ontheotherhand,ifclusteringoraggregationis EvolutionoftheIGM”(Bergeronetal.2004).Inparticular,we 1 Themeansquared errorofanestimatorξ(x)ofξ(r)isgivenby have chosen two examples of Ly-α forest contaminated by a MSE (r)=E[(ξ(r) ξ(r))2].AsimilarexpressionholdsforMSE (r). largenumberofmetaltransitions:PKS0237-23atz = 2.233 ξ − e K em e e e R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction 5 andPKS2126-158atz = 3.267,forwhichthe lineblanket- ormorelistsoflineshavetobeanalyzedatthesametime(see em ingduetothesameLy-αlinesismoresevereastheemission below). redshiftislarger. The quasar spectra have a resolution R 45000 and a 5. Multi-listanalysis ≃ signal-to-noise ratio in the Ly-α forests S/N 80 per pixel. They have been reduced with the pipeline of≃the instrument Here,webrieflyillustrateanadditionalbenefitinusingthees- (version 2.1, Ballesteretal. 2000) provided by ESO in the timator ξ(r). In particular, the fact that it can be easily gen- contextofthedatareductionpackageMIDAS.Thecontinuum eralized for the estimation of a two-pointcorrelationfunction b level was determined with a manual subjective method based whentwoormorelistsoflinesareavailable.Themainproblem on the selection of the regionsfree from clear absorptionthat is that, in anexperimentalcontext,the lists canhave different aresuccessivelyfittedwithasplinepolynomialof3rddegree. characteristics concerningthe coveredspatial intervalsand/or Thenormalizedspectraaretheninspectedtoidentifymetalab- thetotalextensionofthegaps.Hence,thenaiveestimator sorption systems with a particular attention to the transitions L 1 that fall in the Ly-α forest. Once the forest has been cleaned, Ξ(r)= ξ(r), (17) l theHlinesarefittedwithVoigtprofilesintheLYMANcontext LXl=1 oftheMIDASreductionpackage.Inordertoincreasetherelia- b bilityofthefitparametersforthesaturatedLy-αlinesweused with ξl(r) the estimate obtained from the lth data set, can alsotheotherunblendedlinesoftheLymanseriesfallinginthe provide unsatisfactory results since all the lists have the b observed spectral range. We excluded 1000 km s 1 from the same importance in forming the final result. Some kind of − Ly-βemissionand5000kms 1 fromtheLy-αemissionofthe weighting is necessary. In this respect, it has been proved − quasar to avoid associated absorbers and the proximity effect (Ohser&Muëcklich 2000)thattheestimator duetothequasaritself.Furthermore,wecutoutofthespectra L W W ξ(r) the regions occupied by the metal transitions and the Lyman Ξ(r)= l=1| ∩ r|l l (18) lcimmit2)sywshteicmhsc(oLuyl-dαmliansekstwheitphrecsoelunmcenodfewnesiatkyeNrl(iHneI)si≥nt1h0e1i7r b PPlL=1|W∩Wbr|l − provides an estimate of ξ(r) with a smaller variance with re- velocity profiles (line blanketing). At the smallest scales the specttoΞ(r)foreachvalueofr.Withthenumericaltechniques clusteringpropertiesareaffectedbythetypicalvelocitywidth described in Sects. B-D, the implementationof the algorithm of the lines( 25 30km s 1, e.g.Kim et al. 2001),for this − for the computationof Ξ(r) is trivial. However,the character- ∼ − reasonwehavemergedthelinepairscloserthan25kms 1. − istics and the performances of this estimator are beyond the Figures 10-13 show K(∆r) and ξ(∆r) as a function of co- scopeofthepresentpapber. moving spatial separation, ∆r, for the Ly-α forest data of b b the objects PKS0237-23 and PKS2126-158. For PKS0237- 6. Conclusions 23 the number of Ly-α lines is 191 and the total length of gaps amounts to about 16% of the interval spanned by W, In this work we have analyzedpossible sourcesof bias in the whereas for PKS2126-158these quantities are 300 and 20%, estimationofthesecond-ordercharacteristicsoftheLy-αfor- respectively.For reference,the correspondingfunctionsedge- est. In particular, the metallic and the strong Ly-α lines have corrected but evaluated without considering the gaps are also been considered that can hidden part of the lines of interest. displayed.Inboththecases,itisevidentthestrongerindication This may result in a severe bias of both the K-function and ofclusteringifthegapsarenottakenintoaccount.Forthesame the two-point correlation function. Specifically, an erroneous experiment, Figs. 14-15 show the comparison between ξ(∆r) indication of clustering can happen even in the case of no- andtheestimateofthetwo-pointcorrelationfunctionobtained interactionorcompletespatialrandomnessinthe linepattern b with the MC method by Landy&Szalay (1993). In this last or, if a clusteringis really present,a higherdegreeof cluster- case a numberof ND = 50datasets havebeengeneratedthat ingthantruecanbeobtained.Inordertohandlethisproblem, containanumberofrandomsamplesequaltothatoftheorigi- we have proposed a computational efficient method that has nalones.NDhasbeenchosenlargeenoughthatthemeanofthe beenimplementedintwodifferentalgorithms;onethatisable estimatedtwo-pointcorrelationfunctionsdoesnotchangesig- toprovideexactresultsandtheotheronethatisapproximated nificantlywhenmoredatasetsareadded.Thegoodagreement but very fast. Its excellent performances have been validated is evident. However,if the computationof ξ(∆r) takes a total throughnumericalsimulationsandanapplicationtorealdata. of1-2secondsofCPUtime2,theMCmethodrequiresasimi- Wehavealsopresentedanextensionofthemethodfortheesti- b laramountoftimeforeachrandomsample.Now,ifonetakes mationofthetwo-pointcorrelationfunctioninthecaseoftwo intoaccountthatusuallythebandwidthhhastobedetermined ormorelistsoflinesthathavetobeanalyzedatthesametime. byatrailanderrorapproach,alsowiththelimitedsizeofthe datasetshereused,thecomputationaladvantageofthemethod presentedinthispaperisobvious.Thisisespeciallytrueiftwo 2 ExperimentshavebeenconductedwithMatlab6onaPentiumIV -1500GHzprocessorinaWindows2000operatingsystem. 6 R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction K−function / biased estimator K−function / unbiased estimator 0.2 0.2 0.18 0.18 0.16 0.16 0.14 0.14 0.12 0.12 K(r) 0.1 K(r) 0.1 0.08 0.08 0.06 0.06 0.04 0.04 True True 0.02 Biased 0.02 Unbiased 16% CI 16% CI 84% CI 84% CI 0 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 r r Fig.3. Mean and 68% (i.e., 1-σ) confidence interval for the Fig.5.AsinFig.3butfortheunbiasedestimatorK(r)andthe biasedestimator K¯(r),with periodicboundaryconditions,ob- adaptedintensityλ (r). V b tained from 100 realizations of a Poisson process with gaps. Thedata sets hasbeensimulatedby drawing200pointsfrom K−function / unbiased estimator a Poisson process in the interval [0,1] and discharging those that are in subset [0.20,0.25] [0.30,0.35] [0.50,0.55] ∪ ∪ ∪ 0.2 [0.70,0.75] [0.80,0.85].ForreferencealsotheK-functionof ∪ thePoissonprocessisshown(“true”line).Becauseoftheim- posedperiodboundaryconditions,estimatorK¯(r)isfreefrom theedgeeffects. 0.15 K−function / biased estimator K(r) 0.1 0.2 0.05 True Poisson 0.15 Unbiased 16% CI 84% CI 0 K(r) 0 0.01 0.02 0.03 0.04 0.r05 0.06 0.07 0.08 0.09 0.1 0.1 Fig.6.AsinFig.4butfortheunbiasedestimatorK(r)andthe adaptedintensityλ (r). V b 0.05 True Poisson Biased 16% CI 84% CI 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 r Fig.4. The same as in Fig. 3 butfor a Cox processsimulated in the interval[0,1]by takingthe absolute valueof a station- ary Gaussian random field with a Gaussian correlation function whose dispersion is set to 0.01. The Gaussian random fields have been generated under the hypothesis of periodic boundaryconditions.Because ofthis,the linelabeled“True”, that providesthe mean value of the estimator K¯(r) with periodic boundary conditions, is free from the edge effects. This meanhasbeencomputedonthebasisof5000simulationswith W [0,1]. For reference, also the K-function of a Poisson ≡ processisshown. R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction 7 Two−point correlation function x 10−3 5 True 0.25 Biased Unbiased 16% CI 4.5 84% CI 0.2 4 0.15 3.5 0.1 ξ(r) MSE(r) 0.05 3 0 2.5 −0.05 2 −0.1 Estimated Theoretical 1.5 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 r r Fig.7.AsinFig.3butfortheunbiasedestimatorξ(r)andthe Fig.9. Estimated MSE (r) (calculated for eleven values of r) ξ adaptedintensityλ (r).h=0.01. S b vs.thetheoreticalonefortheexperimentinFig.7.Noticethe non-monotonic behavior of the two functions that is directly Two−point correlation function linked to the fact that the observing window W is constituted True Biased bydisjointsegments. Unbiased 16% CI 0.8 84% CI 0.6 0.4 ξ(r) 0.2 0 −0.2 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 r Fig.8.AsinFig.4butfortheunbiasedestimatorξ(r)andthe adaptedintensityλ (r).h=0.005. S b 8 R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction PKS0237 PKS2126 15 20 15 10 ∆r) ∆r) K( K( 10 5 5 Unbiased Unbiased Biased Biased 0 0 0 2 4 6 8 10 0 1 2 3 4 5 6 7 ∆r (comoving Mpc) ∆r (comoving Mpc) Fig.10.K(∆r)forthedataofPKS0237-23whenthegapsdue Fig.12.K(∆r)forthedataofPKS2126-158whenthegapsdue to the metal and the strong Ly-α lines are taken into account to the metal and the strong Ly-α lines are taken into account b b (unbiasedestimator)asafunctionofthecomovingspatialsep- (unbiasedestimator)asafunctionofthecomovingspatialsep- aration,∆r.Thelargestvalueof∆rinthefigurecorrespondsto aration,∆r.Thelargestvalueof∆rinthefigurecorrespondsto about5%theintervalspannedbythisquantity.Thetotalexten- about5%theintervalspannedbythisquantity.Thetotalexten- sion of gaps amountto about16% of spatial intervalcovered sion of gaps amountto about20% of spatial interval covered byW.Forreference,thesameestimateisshownwhengapsare by W. For reference, the same estimator is shown when gaps neglected(biasedestimator). areneglected(biasedestimator). PKS0237 PKS2126 0.25 Unbiased Unbiased 0.6 Biased Biased 0.2 0.5 0.4 0.15 0.3 0.1 ξ∆(r) 0.2 ξ∆(r) 0.05 0.1 0 0 −0.1 −0.05 −0.2 −0.1 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 ∆r (comoving Mpc) ∆r (comoving Mpc) Fig.11.ξ(∆r) for the data of PKS0237-23whenthe gapsdue Fig.13.ξ(∆r)forthedataofPKS2126-158whenthegapsdue to the metal and the strong Ly-α lines are taken into account to the metal and the strong Ly-α lines are taken into account (unbiasebdestimator)asafunctionofthecomovingspatialsep- (unbiasebdestimator)asafunctionofthecomovingspatialsep- aration,∆r.Thelargestvalueof∆rinthefigurecorrespondsto aration,∆r. Thetotalextensionofgapsamountto about20% about5%theintervalspannedbythisquantity.Thetotalexten- of spatial interval covered by W and h = 0.001 in the same sion of gaps amountto about16% of spatial intervalcovered units. For reference, the same estimator is shown when gaps byW andh= 0.001inthesameunits.Forreference,thesame areneglected(biasedestimator). estimateisshownwhengapsareneglected(biasedestimator). R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction 9 PKS0237 0.5 Unbiased Unbiased MC 0.4 0.3 0.2 ξ∆(r) 0.1 0 −0.1 −0.2 1 2 3 4 5 6 7 8 9 10 11 ∆r (comoving Mpc) Fig.14.ξ(∆r) for the data of PKS0237-23whenthe gapsdue to the metal and the strong Ly-α lines are taken into account b (unbiasedestimator)asafunctionofthecomovingspatialsep- aration,∆r.Thelargestvalueof∆rinthefigurecorrespondsto about5%theintervalspannedbythisquantity.Thetotalexten- sion of gaps amountto about16% of spatial intervalcovered byW andh=0.001inthesameunits.Forreference,theresult providedbytheMCmethodbyLandy&Szalay (1993)isalso shown(unbiasedMCestimator). PKS2126 0.15 0.1 0.05 ∆r) 0 ξ( −0.05 −0.1 Unbiased Unbiased MC 1 2 3 4 5 6 7 ∆r (comoving Mpc) Fig.15.ξ(∆r)forthedataofPKS2126-158whenthegapsdue to the metal and the strong Ly-α lines are taken into account b (unbiasedestimator)asafunctionofthecomovingspatialsep- aration,∆r.Thelargestvalueof∆rinthefigurecorrespondsto about5%theintervalspannedbythisquantity.Thetotalexten- sion of gaps amountto about16% of spatial intervalcovered byW andh=0.001inthesameunits.Forreference,theresult providedbytheMCmethodbyLandy&Szalay (1993)isalso shown(unbiasedMCestimator). 10 R.Vioetal.:Unbiasedestimationofthetwo-pointcorrelationfunction x x s= s+z2 y1; min max − end a b a b |W| = 0.8 i2=i2+1; 1 1 2 2 z1=b[i2 1]; − z2=a[i2]; end |W| = 0.65 elseifz1<y2then r < r > ifz1>y1then s= s+y2 z1; − else |W ∩ W| = 0.45 s= s+y2 y1; r − end i1=i1+1; y1=b[i1 1]+r; − ← OBSERVING INTERVAL → ifi1 nk+1then ≤ y2=a[i1]+r; Fig.A.1. Example of an observing window W with two gaps, ify2> x then max itsshiftedversionW andtheintersectionW W .Thewindow y2= x ; r ∩ r max lengthsaregiveninunitsoftheobservingintervalx x . end max min − r=0.15inthesameunits. end end AppendixA: Anexactmethodtocompute W W end r | ∩ | end Thefollowingpresentsapseudo-codethatimplementsanexact returns algorithm for the computation of W W for a window W r | ∩ | with gaps. It is assumed that the observing window W lies in AppendixB: An approximatefastmethodto theinterval[x ,x ]andhasn gaps[a ,b ]fork=1,...,n min max k k k k compute W W with r | ∩ | Themethodpresentedintheprevioussectionprovidesanexact x =b <a <b <...<a <b <a = x . (A.1) min 0 1 1 nk nk nk+1 max valueof W W .Ifthecomputingtimeisofconcern,amore r | ∩ | Fig.A.1showsthesituationforthecasen =2. efficient,althoughapproximatedmethod,ispossible.Theidea k The algorithm considers all combinations of the intervals istoconsidertheobservingwindowWasafunction (x)inR W betweenthegaps,determinesthelengthsoftheirintersections definedin the domain( ,+ ).When x [x ,x ], then min max −∞ ∞ ∈ andsumsup.Thewantedlength W W isdenotedbys. (x) = 0 if x belongs to a gap, (x) = 1, otherwise. If r | ∩ | W W %input: x<[x ,x ],itis (x)=0. min max W %r interpointdistance Itisnotdifficulttorealizethat,forafixedr ,γ(r ) W ∗ ∗ −→ ≡| ∩ % W isgivenby r ∗| + %initializationofindices: γ(r )= ∞ (x) (x+r )dx. (B.1) ∗ ∗ Z W W s=0; −∞ i1=1; Whenr∗ismadetochangeintheinterval( ,+ ),thisequa- −∞ ∞ i2=1; tionprovidestheautocorrelationfunctionof (x).Hence, W γ(r)=IFT FT[ (x)] (FT[ (x)]) . (B.2) ′ %fixtheshiftedinterval: W × W (cid:8) (cid:9) y1=b[i1 1]+r; Thesymbol“ ”meanscomplexconjugation,whereasFTand ′ − y2=a[i1]+r; IFTdenotetheFouriertransformandtheinverseFouriertrans- ify2> x theny2= x ;end form,respectively. max max %fixtheoriginalinterval: AnefficientimplementationofEq.(B.2)requirestheinter- z1=b[i2 1]; val [x ,x ] be discretized in a set of N bins. In the case min max b − z2=a[i2]; ofthe estimatorξ(r),thisdoesnotrepresentatruelimitsince it is a function that is computed on spatial bins of length 2h b %computethelengthoftheintersections (seeEq.(11)).Thevalueof N touseinthediscretizationde- b while(i1 n +1)and(y1< x )do pends on factors as the extension of the gaps, the number of k max ≤ ifz2<y2then pointsandtherangeofdistancer ofinterest(seealsobelow). ifz2>y1then Inanycase,thisisnotacriticalquantity.Inthenumericalex- ifz1>y1then perimentspresentedintheprevioussections,N =5000issuf- b s= s+z2 z1; ficienttoobtainresultsalmostindistinguishablefromthoseob- − else tainable with an exact approach. However, even values of N b

Ly-alpha forest: efficient unbiased estimation of second-order properties with missing data PDF

0.28 MB·English

by R. Vio

#additional_collections #journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Download Ly-alpha forest: efficient unbiased estimation of second-order properties with missing data PDF Free - Full Version

by R. Vio| 0.28| English

Download Ly-alpha forest: efficient unbiased estimation of second-order properties with missing data by R. Vio in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Ly-alpha forest: efficient unbiased estimation of second-order properties with missing data

No description available for this book.

Detailed Information

Author:	R. Vio
Language:	English
File Size:	0.28
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Ly-alpha forest: efficient unbiased estimation of second-order properties with missing data Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Ly-alpha forest: efficient unbiased estimation of second-order properties with missing data PDF?

Yes, on https://PDFdrive.to you can download Ly-alpha forest: efficient unbiased estimation of second-order properties with missing data by R. Vio completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Ly-alpha forest: efficient unbiased estimation of second-order properties with missing data on my mobile device?

After downloading Ly-alpha forest: efficient unbiased estimation of second-order properties with missing data PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Ly-alpha forest: efficient unbiased estimation of second-order properties with missing data?

Yes, this is the complete PDF version of Ly-alpha forest: efficient unbiased estimation of second-order properties with missing data by R. Vio. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Ly-alpha forest: efficient unbiased estimation of second-order properties with missing data PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.