Table Of ContentREPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188
The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments
regarding this burden estimate or any other aspect of this collection of information, including suggesstions for reducing this burden, to Washington
Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA, 22202-4302.
Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any oenalty for failing to comply with a collection of
information if it does not display a currently valid OMB control number.
PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS.
1. REPORT DATE (DD-MM-YYYY) 2. REPORT TYPE 3. DATES COVERED (From - To)
New Reprint -
4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER
Tree approximation of the long wave radiation parameterization W911NF-07-1-0185
in the NCAR CAM global climate model
5b. GRANT NUMBER
5c. PROGRAM ELEMENT NUMBER
611103
6. AUTHORS 5d. PROJECT NUMBER
Alexei Belochitski, Peter Binev, Ronald DeVore, Michael
Fox-Rabinovitz, Vladimir Krasnopolsky, Philipp Lamby 5e. TASK NUMBER
5f. WORK UNIT NUMBER
7. PERFORMING ORGANIZATION NAMES AND ADDRESSES 8. PERFORMING ORGANIZATION REPORT
NUMBER
University of South Carolina Research Foundation
901 Sumter ST, 5th floor Byrnes
Columbia, SC 29201 -3961
9. SPONSORING/MONITORING AGENCY NAME(S) AND 10. SPONSOR/MONITOR'S ACRONYM(S)
ADDRESS(ES) ARO
U.S. Army Research Office 11. SPONSOR/MONITOR'S REPORT
P.O. Box 12211 NUMBER(S)
Research Triangle Park, NC 27709-2211 52559-MA-MUR.22
12. DISTRIBUTION AVAILIBILITY STATEMENT
Approved for public release; distribution is unlimited.
13. SUPPLEMENTARY NOTES
The views, opinions and/or findings contained in this report are those of the author(s) and should not contrued as an official Department
of the Army position, policy or decision, unless so designated by other documentation.
14. ABSTRACT
The computation of Global Climate Models (GCMs) presents significant numerical challenges. This paper presents
new algorithms based on sparse occupancy trees for learning and emulating the long wave radiation
parameterization in the NCAR CAM climate model. This emulation occupies by far the most significant portion of
the computational time in the implementation of the model. From the mathematical point of view this
parameterization can be considered as a mapping which is to be learned from scattered data samples (xi,yi),
15. SUBJECT TERMS
Climate and weather prediction; Numerical modeling; Non-parametric regression; High-dimensional approximation; Neural network;
Sparse occupancy tree
16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 15. NUMBER 19a. NAME OF RESPONSIBLE PERSON
a. REPORT b. ABSTRACT c. THIS PAGE ABSTRACT OF PAGES Andrew Kurdila
UU UU UU UU 19b. TELEPHONE NUMBER
540-231-8028
Standard Form 298 (Rev 8/98)
Prescribed by ANSI Std. Z39.18
Report Title
Tree approximation of the long wave radiation parameterization in the NCAR CAM global climate model
ABSTRACT
The computation of Global Climate Models (GCMs) presents significant numerical challenges. This paper presents
new algorithms based on sparse occupancy trees for learning and emulating the long wave radiation parameterization
in the NCAR CAM climate model. This emulation occupies by far the most significant portion of the computational
time in the implementation of the model. From the mathematical point of view this parameterization can be
considered as a mapping which is to be learned from scattered data samples (xi,yi), i=1,…,N. Hence, the problem
represents a typical application of high-dimensional statistical learning. The goal is to develop learning schemes that
are not only accurate and reliable but also computationally efficient and capable of adapting to time-varying
environmental states. The algorithms developed in this paper are compared with other approaches such as neural
networks, nearest neighbor methods, and regression trees as to how these various goals are met.
REPORT DOCUMENTATION PAGE (SF298)
(Continuation Sheet)
Continuation for Block 13
ARO Report Number 52559.22-MA-MUR
Tree approximation of the long wave radiation p ...
Block 13: Supplementary Note
© 2011 . Published in Journal of Computational and Applied Mathematics, Vol. 0, (0), Ed. 0 (2011), (Ed. ). DoD Components
reserve a royalty-free, nonexclusive and irrevocable right to reproduce, publish, or otherwise use the work for Federal purposes,
and to authroize others to do so (DODGARS §32.36). The views, opinions and/or findings contained in this report are those of
the author(s) and should not be construed as an official Department of the Army position, policy or decision, unless so
designated by other documentation.
Approved for public release; distribution is unlimited.
JournalofComputationalandAppliedMathematics236(2011)447–460
ContentslistsavailableatSciVerseScienceDirect
Journal of Computational and Applied
Mathematics
journalhomepage:www.elsevier.com/locate/cam
Tree approximation of the long wave radiation parameterization in the
NCAR CAM global climate model
AlexeiBelochitskia,PeterBinevb,⇤,RonaldDeVored,MichaelFox-Rabinovitza,
VladimirKrasnopolskyc,PhilippLambyb
aEnvironmentalSystemScienceInterdisciplinaryCenter,UniversityofMaryland,MD,USA
bInterdisciplinaryMathematicsInstitute,UniversityofSouthCarolina,Columbia,SC,USA
cEnvironmentalModelingCenter,NationalCentersforEnvironmentalPrediction,NationalOceanicandAtmosphericAdministration,MD,USA
dDepartmentofMathematics,TexasA&MUniversity,CollegeStation,TX,USA
a r t i c l e i n f o a b s t r a c t
Keywords: The computation of Global Climate Models (GCMs) presents significant numerical
Climateandweatherprediction challenges. This paper presents new algorithms based on sparse occupancy trees for
Numericalmodeling
learningandemulatingthelongwaveradiationparameterizationintheNCARCAMclimate
Non-parametricregression
model.Thisemulationoccupiesbyfarthemostsignificantportionofthecomputational
High-dimensionalapproximation
Neuralnetwork time in the implementation of the model. From the mathematical point of view this
Sparseoccupancytree parameterizationcanbeconsideredasamappingR220 ! R33 whichistobelearned
from scattered data samples (xi,yi), i 1,...,N. Hence, the problem represents a
=
typicalapplicationofhigh-dimensionalstatisticallearning.Thegoalistodeveloplearning
schemes that are not only accurate and reliable but also computationally efficient and
capableofadaptingtotime-varyingenvironmentalstates.Thealgorithmsdevelopedin
thispaperarecomparedwithotherapproachessuchasneuralnetworks,nearestneighbor
methods,andregressiontreesastohowthesevariousgoalsaremet.
©2011ElsevierB.V.Allrightsreserved.
1. Introduction
Themaincomputationalburdeningeneralcirculationmodels(GCMs)forhigh-qualityweatherpredictionandclimate
simulation is caused by the complexity of the physical processes that include, in particular, atmospheric radiation,
turbulence,convection,clouds,largescaleprecipitation,constituencytransportandchemicalreactions.Theseprocesses
aremodeledbyparameterizationswhicharecomplex1Dsubgrid-scaleschemesformulatedusingrelevantfirstprinciples
andobservationaldata.Theevaluationoftheseparameterizationstypicallyconsumes70%–90%oftheoverallCPU-time
in the GCM we consider in this work—the National Center of Atmospheric Research (NCAR) Community Atmospheric
Model(CAM).
Toovercomethiscomputational‘‘bottleneck’’,ithasbeenproposedtoemploystatisticallearningmethodslikeneural
networksinordertoacceleratetheevaluationoftheparameterizations.Thisideaoriginatesfrom[1],whereabatteryof
neuralnetworkshasbeenusedtoapproximatecertainpartialfluxeswithinthelongwaveradiation(LWR)parameterization
oftheEuropeanCenterforMedium-RangeWeatherForecasts,explicitlyexploitingthestructureofthatparticularmodel.
Anothermoredirectandmoregeneralapproachispursuedin[2]whereasingleneuralnetworkemulationisusedto
entirelyreplacetheLWRparameterizationintheNCARCAMasawhole.Thisparameterizationtakesasurfacecharacteristic
⇤ Correspondingauthor.Tel.:+18035766269.
E-mailaddresses:[email protected](A.Belochitski),[email protected](P.Binev),[email protected](R.DeVore),[email protected]
(M.Fox-Rabinovitz),[email protected](V.Krasnopolsky),[email protected](P.Lamby).
0377-0427/$–seefrontmatter©2011ElsevierB.V.Allrightsreserved.
doi:10.1016/j.cam.2011.07.013
448 A.Belochitskietal./JournalofComputationalandAppliedMathematics236(2011)447–460
andthediscretizationsoftenverticalprofilesoflocalphysicalpropertiesandgasconcentrationsasinputandreturnsa
verticalprofileofheatingratesandsevenheatfluxesasoutput.Mathematicallythisparameterizationcanbeconsideredasa
mappingf R220 R33,seeSection2.1.Thebasicideaofanemulationis,foranygiveninput,togetanapproximationofthe
: !
outputvariablesbyevaluatingalearnedapproximation(afastprocess)insteadofevaluatingtheoriginalparameterization
(which is a rather complex numerical simulation in its own right). The approximating function is learned from a set
of training data points (xi,yi) for which the outputs yi f(xi) have been computed by solving the original physical
=
parameterization.Theapproximationshouldbebothaccurateandeasytoevaluateinordertoprovideasignificantspeed-up
oftheGCMwithoutsignificantlychangingthepredictionofalong-termclimatesimulation.
Whileartificialneuralnetworkscanbeconsideredasthecurrentstate-of-the-artblackboxmethodologyforawide
rangeofhigh-dimensionalapproximationproblems,andjustifiablyso,theymaynotnecessarilybethebestsolutionforthis
particularapplication.Theaccuracyofneuralnetworkemulationsdependsonthenumberoflayersandhiddenneurons
employed.Whileitisknownthatneuralnetworksareuniversalapproximators,i.e.,theycanapproximateanycontinuous
functionstoanypredeterminedaccuracy(seeforinstance[3,4]),thisisachievedonlybyallowingthenumberofneuronsto
increasearbitrarily.However,thelearningofthenetworkparameters(weights)requiresthesolutionofalarge,non-linear
optimizationproblem,whichisverytime-consuming,pronetodeliversub-optimalsolutionsand,thus,severelylimitsthe
complexityofthenetworkthatonecanaffordtotrain.
Thisbecomesapracticalissueconsideringthefollowingtask:theapproximationistrainedbyadatasetthatconsistsof
evaluationsoftheoriginalparameterizationgainedduringareferencerunoftheclimatemodel.Theinputsofthistraining
dataset,therefore,coverthephysicalstatesobservedduringacertaintimeperiodofclimatehistory.However,thedomain
inwhichtheparameterizationistobeevaluatedmaychangewithtimeasinthecaseofclimatechange.Insuchsituations
theapproximationmaybeforcedtoextrapolatebeyonditsgeneralizationability,whichmayleadtolargeerrors.Inthis
caseitcouldbecomenecessarytore-traintheemulationinordertoadaptittothenewenvironment.
Itwouldthereforebeadvantageoustohaveanalternativetoneuralnetworksthatwouldofferaneasiertrainingprocess
and that would perhaps even be capable of incremental learning. Additionally, if one could estimate the error of the
approximationforacertaininput,onecouldusetheoriginalparameterizationasafall-backoptionduringarunofthe
GCMandimmediatelyincorporatethenewdataintotheemulation.Actually,[5]addressestheproblemoferrorestimation
foraneuralnetworkemulation,butthequestionofhowtodynamicallyadapttheapproximationisleftopen.
Inthepresentpaperwesearchforanalternativetoneuralnetworkswithintheclassofnon-parametricapproximation
methods.Wecannotofferafullrealizationoftheprogramoutlinedabove,butwerestrictourselvestobasicdesigndecisions
andtestingwhethersuchaprogramhasanychancetobesuccessfullyimplemented.Inparticular,wediscussthefeatures
oftwocommonstatisticallearningparadigms,(approximate)nearestneighborsandregressiontrees,andpresentanew
algorithm based on what we call sparse occupancy trees, which aims to provide a very efficient nearest neighbor type
algorithmcapableofincrementallearning.Thedevelopmentofthelatterconceptwasoriginallymotivatedbythepresent
applicationandiscomprehensivelydescribedin[6].
In order to motivate why we were actually trying to design new algorithms instead of just using an off-the-shelf
algorithm,webrieflydescribetheaforementionedtechniques.MoreinformationcanbefoundinSection3andinstandard
textbookslike[7].Non-parametriclearningmethodstypicallytrytopartitiontheinputspaceandthenusesimplelocal
models like piecewise constants to approximate the data. In the case of nearest neighbor methods, the input space is
implicitlypartitionedbythewaythetrainingdataisdistributed:theapproximationisconstantforquerypointsthathave
thesamesetofnearestneighbors.Unfortunatelyinhighdimensionstherearenofastalgorithmswhichcouldanswerthe
question‘‘whatarethenearestneighborstoagivenquerypointx?’’.Thereforeonemustbecontentwithapproximate
answerstothisquestionthatcanberealizedusingso-calledkd-orbd-trees.Here,assumingthatallthetrainingdatais
availablebeforehand,theinputdomainisrecursivelypartitioneddependingonthedistributionoftheinputpoints.
Regressiontreesfollowamoreadaptiveapproachandalsousethey-valuesinordertodefinethedomainpartition.
Here,startingwiththeentireinputdomain,thecellsinthepartitionarerecursivelysubdividedsuchthattheresidualofthe
resultingapproximationisminimizedineachstep.Obviously,duetotheirrecursivedefinition,noneofthesetechniquesis
availableforincrementallearningwithoutmodification:anewdatapointcouldtheoreticallychangethedecisionhowto
performthefirstsplitinthetree,whichwouldrequirerelearningthetreefromtheverybeginning.Sparseoccupancytrees,
ontheotherhand,encodethedatainaformatthatisindependentofthedistributionoftheincomingdata.
Itmustbenotedherethatnopartitioningschemecanbeexpectedtobesuccessfulforarbitraryhigh-dimensionaldata.
(Thesamecouldbesaidaboutneuralnetworks,althoughforotherreasons.)Forinstance,ifthedatapointswereuniformly
distributedinaveryhigh-dimensionalspace,theattempttogeneratelocalapproximationslikethosedescribedabovewould
bedoomed,becausetheaveragedistancebetweenaquerypointandthebestfittingdatapointmightbecomelargeevenfor
hugetrainingdatasets.Thisisoftenreferredtoasthecurseofdimensionality.Oneusuallymakestheassumptionthatthedata
isactuallydistributedoversomelower-dimensionalsubmanifoldorisconcentratedinasubsetofsmallmeasurewithinthe
wholeinputspace.Inourspecialcasethisassumptionisjustifiedbecausetheinputdataisgroup-wisestronglycorrelated.
Onepurposeofthisworkistoquantifythiseffect,andinSection4.3wegiveanestimateoftheintrinsicdimensionsofthe
datasetwhichshowsthatnon-parametricapproximationofthelongwaveradiationdatashouldindeedbefeasible.
Themainobstaclefortheapplicationoftree-basedapproximationschemesseemstobeimplementingitinahighly
parallel computer system, which is unavoidable in the simulation of huge, complex systems like global climate. Non-
parametricapproximationmethodsarememorybased,i.e.,theyneedtostoreallthetrainingdatapermanently.Thislimits
A.Belochitskietal./JournalofComputationalandAppliedMathematics236(2011)447–460 449
theirpracticalusetosharedmemorysystems,becauseondistributedmemorysystemseachcoreorprocessorcanaddress
onlyalimitedamountofdata.ThecurrentimplementationoftheNCAR-CAMsystemhowever,seemstobeoptimizedfor
suchdistributedmemoryarchitecturesandrequirestheemulationtobestoredoneachindividualprocessor.Nevertheless,
asproofofconcept,weperformeda10-yearclimatesimulationwheretheoriginalparameterizationwasreplacedbya
regression tree. We report on this numerical experiment in Section 5. The tree was chosen as a compromise between
accuracyandstorageconsiderationsandtrainedwitharatherlimitedamountofdata.Thatmeansthatoneofthemain
advantages of non-parametric learning, namely its ability to cope with large amounts of data, was not really exploited.
Despitethat,wewereabletoachievearatherpromisingresult,whichindicatesthepotentialusefulnessofthisapproachin
futureprojects.
2. Generalconsiderations
In this section we provide the mathematical formulation of the approximation problem and introduce some basic
notation used throughout the paper. Furthermore, we make some very general remarks concerning the design of
approximationschemesforheterogeneousvector-valuedfunctions.
2.1. StructureoftheLWRparameterization
The input vectors for the LWR parameterization include one surface characteristic and ten profiles: atmospheric
temperature;humidity;theconcentrationsofozone,CO ,N O,andCH ;twochlorofluorocarbonmixingratios;pressure;
2 2 4
andcloudiness.Theprofilesarediscretizedandpresentedbythevaluesin26verticallayersoftheatmosphere.Someofthe
quantitiesareconstantincertainlayersandhavethereforebeenfilteredoutoftheinputvectors,effectivelyleaving220
inputparameters.Theoutputvectorsconsistofaprofileofheatingratesandsevenradiationfluxes,altogether33values.
Hence,theparameterizationcanbeconsideredasafunction
f R220 �! R33 .
:⇢x=(xj)j=1,...,220 7�! y=(yl)l=1,...,33�
Inwhatfollows,wedenotethecomponentsofavectorwithsubscripts,whileweusesuperscriptstoindicatedifferentdata
pointsintheset.
2.2. Errorstatistics
The goal is to find an approximation that closely matches the original parameterization when used in the GCM. Of
course,onecannotaffordtotestthisinthedevelopmentstageofanapproximation,butonecantestthequalityofthe
approximationusingstatisticalmeans.Theprocedureisusuallyasfollows:wegeneratetwodatasets.Thetrainingdata
set X (xi,yi), i 1,...,N is used to train the parameters of the approximation, for instance the weights of the
= { = }
neuralnetortheunderlyingpartitionofthetreealgorithm.Thenthetestdatasetisusedtomeasurethebiasandresidual
mean square errors to achieve a statistical evaluation of the approximation accuracy. We denote the test data set with
X (xi,yi), i 1,...,M .Furthermoreweindicateanemulationoftheoriginalparameterizationwithf anditsoutput
ˆ ={ ˆ ˆ = } ˜
f(xi)byyi.Usingtheseconventions,thebiasorsystematicerroroftheapproximationforthel-thlayerisdefinedby
˜ ˆ ˜
1 M
B yi yi (1)
l = M ˆl�˜l
i 1
X=
andthetotalbiasis
1 D
B B (2)
= D l
l 1
X=
whereDisthenumberofoutputcomponents.Sincetheheatingratesandtheheatfluxesintheoutputvectoraremeasured
indifferentunits,ithardlymakessensetocomputeerrormeasuresforall33outputcomponentssimultaneously;instead
werestrictourselvestotheheatingrates,i.e.,thefirst26components.InthiscaseDcanbeconsideredasthenumberof
layers.Thebiasisactuallynotameasurementofaccuracy(itcouldbezeroevenforanarbitrarilyinaccurateapproximation),
butagoodapproximationwithalargebiaswouldbeunacceptable,becauseonehastofearthatsystematicdeviationscould
accumulateintheGCM.
Wemeasureaccuracyusingtherootmeansquareerror(RMSE).TheRMSEofthel-thoutputcomponentisdefinedas
1 M
RMSE (yi yi)2 (3)
l =vuuM Xi=1 ˆl�˜l
t
450 A.Belochitskietal./JournalofComputationalandAppliedMathematics236(2011)447–460
Fig.1. Layerbias.Scatterplotsoftheneuralnetworkapproximationforthelayersl 26andl 9.Ineachdiagramthevaluesoftheoriginal
= =
parameterizationylareplottedalongthehorizontalaxis,theapproximatedvaluesy˜lalongtheverticalaxis.Thegreenlineshowsthediagonalyl=y˜l.
andthetotalRMSEis
1 D
RMSE RMSE2. (4)
=vuuDXl=1 l
t
2.3. Neuralnetworkapproximations
Anaturalbenchmarkforcomparisonwiththepresentistheneuralnetworkemulationdescribedin[2].Thisisastandard
feed-forwardneuralnetwithonehiddenlayerofneuronsoftheform
m
f(x) ↵ ↵�(� x �) (5)
˜ = 0+ i i· + i
i 1
X=
where� isthesigmoidalfunction�(x) = tanh(x),andthedirections�i 2 Rd,weights↵i 2 Rd0 andintercepts�i 2 Rare
learnedfromthetrainingdatabyleastsquaresfitting.Inparticular,weuseasreferenceanetworkwithm 80hidden
=
neuronswhichhasbeentrainedwith196,608selectedevaluationsoftheoriginalparameterizationfromareferencerunof
theNCAR-CAMsimulatingtheclimatefortheyears1961–1962.
2.4. Monolithicschemesvs.batteries
Note,thattheaboverepresentationcanbedescribedasamonolithic,vector-valuedapproach.Anotherpossiblenetwork
designcouldhavebeen
m
y˜j =↵0,j+ ↵i,j�(�i,j·x+�i,j) (6)
i 1
X=
with�i,j 2 Rdand�i,j 2 Rasabovebut↵i,j 2 R.Thatis,onecouldhaveusedabatteryofneuralnetworksapproximating
eachoutputcomponentindividually.Thiswould,ofcourse,increasethetrainingandevaluationcosts,butwouldalsobe
moreaccurate.Actually,thevector-valueddesignpotentiallycanbecomethesourceofbias.Inthetrainingoftheneural
network,thetotalRMSEisusedastheoptimizationcriterion.However,thevarianceoftheoutputdataishigherforthe
layersnearertothesurfaceoftheearth,hencetheerrorisdominatedbytheselayersandtheneuralnetisadaptedmainly
tothem.Inotherlayers,theerrorsmightberelativelylarger.ThisisdemonstratedinFig.1whichshowsscatterplotsofthe
referenceneuralnetforthelayers9and26.Onecanclearlyseethattheapproximationinlayer9isbiased.Onehastonote
however,thatthescalesinthetwodiagramsdifferbyalmostafactorof30,i.e.,theabsoluteerrorintroducedbythisbiasis
stillverysmall.Apossibleremedyforthisistonormalizetheoutputsinthelearningprocessaccordingtotheirvariances.
Then,alloutputcomponentswillhavethesamerelativeaccuracy,possiblyatthecostofthetotalaccuracy.Thistechnique
couldalsobeappliedtothetree-basedmethodsconsideredinthecurrentpaper.
Sinceweareinterestedinexaminingthepotentialofpossiblealternativestothecurrentemulationdesign,weconsider
mainly, but not exclusively, component-wise approximation in the following. The disadvantages of component-wise
A.Belochitskietal./JournalofComputationalandAppliedMathematics236(2011)447–460 451
approximationsarethattheymightbecomecomputationallymoreexpensiveandcanpotentiallyreturnunrealisticprofiles,
becausetheymaynotproperlyrepresentthecorrelationsbetweencomponentsoftheprofilevector.Thesecorrelationsare
naturallyrepresentedbyvector-wiseapproximation.
3. Descriptionofalgorithms
Inordertokeepthispaperself-containedwegiveaconcisedescriptionofthenon-parametricalgorithmsthatwewill
considerinthefollowingnumericalexperiments.Thereby,wediscussthenearestneighborandregressiontreesonlyvery
briefly,becausetheyarewellestablishedandcomprehensivelydiscussedintheliterature.Wegiveamorecomprehensive
accountofthesparseoccupancytrees,because,asexplainedintheintroduction,theyarenewandhavebeendeveloped
specificallyforthisapplication.
3.1. (Approximate)Nearestneighbors
3.1.1. Basicconcepts
Thek-nearestneighbormethodworksasfollows:onedefinesametric ontheinputspaceandgivenaquerypointx
k·k
findsapermutationn i ofthetrainingdatasuchthat
! n
xi1 x xi2 x xik x xip x
k � kk � k···k � kk � k
forallp>k.Then,oneaveragesthefunctionvaluescorrespondingtothesenearestneighbors
k
f(x) yin
˜ =
n 1
X=
todefineanapproximationoff(x).Unfortunately,itiswellknownthatinveryhighdimensionsitisnotpossibletodesign
fastalgorithmsthatprovidethepermutationsought.Insteadonerelaxesthesearchandiscontentwithanalgorithmthat
returnsapermutationn j suchthat
! n
xjn x <(1 ") xin x
k � k + k � k
foralln.Therearealgorithmsbasedonkd-treesorbd-treesthatprovideafastanswertothisrelaxedproblem,if"ischosen
largeenough.Anintroductiontothistopiccanbefoundin[8].
3.1.2. Datascaling
Thecentralpointintheabovedescriptionis,ofcourse,howtodefinethemetric ontheinputspace.Thisisanon-
k·k
trivialtaskbecausetheinputvectorsincludeseveralphysicalquantitiesmeasuredindifferentunitsandarevaryingover
severalordersofmagnitude.Atrivialmethodtoequilibratethevariousinputquantitiesistocomputethemaximumand
minimumofeachparameterinthetrainingdatasetandtothenscalethiseachcomponentoftheinputvectorindividuallyto
theinterval 0,1 .Then,oneusesthestandardEuclidiannormon 0,1 dtomeasurethedistances.Anotherself-evidentidea
[ ] [ ]
istoscalethevariablesbelongingtothesameprofilewiththesamefactors.Numericalexperimentsshowedthatthesecond
typeofscalingyieldsbetterresults.Therefore,weusethisscalinginthefollowingexperiments.Adaptivenearestneighbor
methodstrytolearnaproblemdependentmetricfromthedata,butwehavenotpursuedthisapproachanyfurther,because
thedataseemstobetoosparsetodefinelocalmetricsreliablyforthisapplication.
3.2. Regressiontrees
3.2.1. CART
ThemostbasicalgorithmforthegenerationofregressiontreesistheClassificationandRegressionTree(CART)algorithm
intensivelyanalyzedin[9].Itcanbesummarizedasfollows:weinitializethepartitionP ⌦ where⌦ d a,b isa
hyper-rectanglethatcontainsallthetrainingpointsanddisthedimensionoftheinputsp=ac{e.T}hen,eachh=yperi-=r1e[ctiangi]lein
Q
thepartitionthatcontainsmorethanagivennumbermofdatapointsisrecursivelysubdividedalongahyperplanex c,
i =
wherei 1,...,d andc a,b ischosensuchthattheRMSEofthebestpiecewiseconstantapproximationonthe
2{ } 2[ i i]
refinedpartitionisminimized.Thatis,theregressionfunctionassumestheaveragevalueofallthepointsinagivenhyper-
rectangleofthepartition.Thisisareasonablyefficientalgorithmthatcanbeusedforawiderangeofclassificationand
regressionproblems.Italsohastheadvantagethatitisindependentofthescalingofthedata.
3.2.2. RandomForests
TheRandomForestsalgorithmin[10]averagestheresponseofagivennumberT ofCARTtrees.Hereby,beforeeach
subdivisionstepinthegenerationoftheCARTtrees,thealgorithmchoosesarandomsubsetofsizePoftheinputparameters
452 A.Belochitskietal./JournalofComputationalandAppliedMathematics236(2011)447–460
(typically about one third of all input parameters) along which the cell is allowed to be subdivided. This ensures that
eachsingleCARTtreegeneratesdifferentpartitionsofthedomain.UsuallythesingleCARTtreesinaRandomForestare
notpruned,i.e.,onetakesm 1intheCART-algorithmanddeliberatelyoverfitsthedata.Nevertheless,RandomForest
=
approximationsarerelativelysmoothduetotheaveragingprocess.RandomForestsaregenerallyconsideredtobeoneof
thebestavailableblack-boxnon-parametricregressionmethods.
3.3. Sparseoccupancytrees
Sparseoccupancytreeshavebeendevelopedasanalternativeforapproximatenearestneighbormethods(see[6]).They
aredesignedtoscalewellinhighdimensionsandtobereadilyavailableforonline-learning.
3.3.1. Setting
Thesparseoccupancyalgorithmstrytoexploitbasicideasfrommultiresolutionanalysisforhighspatialdimension.The
mainunderlyingdatastructureofthemultilevelconstructionisthesubdivisiontreewhichislinkedtoahierarchyofnested
partitions.Thatis,weassumethatforeachlevell�0thesetsPl ={⌦l,k, k2Il}arepartitionsoftheinputspace⌦and
eachcell⌦l,k 2Plisthedisjointunionofcellsonthenextfinerlevell+1:
⌦ ⌦ .
l,k = l+1,r
r[2Il,k
ThehierarchyofpartitionsinducesaninfinitemastertreeT⇤,whoserootis⌦andwhoseothernodesarethecells⌦l,k.Each
nodNeo⌦wl,kleotfutshiasstsruemeiestchoantnaecttreadinbinygandaetdagseettoXitschi(ldxir,eyni)⌦,l+i1,r w1h,e.r.e.,rN2Iisl,kg.iven.AnoccupancytreeT(X)isafinite
= { = }
subtreeofT⇤thatcontainsonlythenodesthatareoccupiedbyatleastonesamplexifromthedatasetX.Inhighdimensions
occupancytreesaretypicallysparse,becausetherearemanymorecellsthandatapoints.Furthermore,thepointscanbe
storedinacompactway.Inthefollowingtwosubsections,wepresenttwoalgorithmsthatmakeuseofthisdatastructure.
3.3.2. Piecewiseconstantapproximationoncubesubdivisions
Apiecewiseconstantapproximationbasedonanoccupancytreecanbedefinedasfollows:givenaquerypointxwe
averagethevaluesofpointsinthefinestcelloftheoccupancytreecontainingthequerypoint.Thisgeneralidea,which
worksforanysubdivisiongeometry,ismosteasilyrealizedfordyadiccubesubdivision,becauseitcanbeveryeasilyencoded.
Assumingthatalldataarescaledsuchthattheyfitintotheunithypercube 0,1 dthepartitionsaregivenby
[ ]
d
Pl =( [ki2�l,(ki+1)2�l], ki 2{0,...,2l�1}).
i 1
Y=
Foranypointx (x ,...,x ) 0,1 dwecanwriteitscomponentsinbinaryrepresentationas
= 1 d 2[ ]
1
xi = bik2�k.
k 1
X=
Notethatthefinitesequenceb ,b ,...,b isabinaryrepresentationoftheintegerk ofthelevel-lcelltowhichthedata
i1 i2 il i
pointbelongs.AssumingthatamaximumrefinementlevelLischosenwecomputethebitstream
(b ,b ,...,b ,b ,b ,...,b ,...,b ,b ,...,b )
11 21 d1 12 22 d2 1L 2L dL
foreachtrainingandquerypoint.Thesubsequencesofdbitsb ,...,b arecalledcharacters.Thebitstreamsofthetraining
1l dl
pointsaresortedlexicographically.Then,inordertodeterminewhichpointsonehastoaveragetoansweraquery,onefinds
thepointswhosebitstreamshavethemostnumberofleadingcharactersincommonwiththebitstreamcorrespondingto
thequerypoint.Thisisessentiallyabinarysearch.
Thisalgorithmisveryfastandstorageefficient,becauseitreplacestheinputdatawithbitstreams.Furthermore,itis
inherentlysuitedtoincrementallearning:ifonegetsanewdatapoint,onecomputesitsbitstream,sortsitintothelist
ofalready-existingbitstreams,andthenthenextquerycanimmediatelybeaccepted.Conceptuallythiscodeisrelatedto
nearestneighborapproximationbecauseitonlyconsidersproximityofthepointsintheinputspace,usingthesimilarity
ofthebitstreamsasmetric.Unfortunately,thisrecoveryschemeisbyfarnotasaccurateasthenearestneighborscheme
becauseofthetreestructure.Thatis,foranytwopointsx,x0 2XthetreedistancedistT(x,x0)istheshortestpathinthetree
T(X)connectingthenodes⌦L,k(x) 3 xand⌦L,k0 3 x0.Ofcourse,whereaskx�x0kmaybearbitrarilysmallforanyfixed
normk·k onRd,thetreedistancedistT(x,x0)couldbe2L.Theaboverecoveryschemetakeslocalaveragesoffunctionvalues
whosetreedistanceissmall,possiblyomittingvaluesforargumentsthataregeometricallyveryclose.Thereareseveral
possibleremedies.Sincetherecoveryschemeisveryfast,perhapsthesimplestoneistoperformseveraldifferentrecoveries
withrespecttodifferentshiftsofthecoordinatesystem(chosenrandomly)andthentaketheaverageoftheoutputs.For
example,inourimplementationwescalethedatatotheinterval 0.3,0.7 ,andthenshiftthedatawithrandomvectorsin
[ ]
A.Belochitskietal./JournalofComputationalandAppliedMathematics236(2011)447–460 453
[�0.3,0.3]d.Letf˜⇢(x)denotetheresultofaqueryatxwiththedatashiftedbythevector⇢andX⇢(x)bethecorresponding
setoftrainingpointsintheleafofthesparseoccupancytreecontainingx.Furthermore,letR(x)bethesetofshifts⇢ for
whichtheleveloftheevaluationismaximal.Then
1
f˜(x)= #(R(x)) f˜⇢(x). (7)
⇢X2R(x)
Withamoderatenumberofrandomshifts,onecanusuallyachieveanaccuracysimilartothatofthenearestneighbor
method.
3.3.3. Sparseoccupancytreesusingsimplices
Overcomingthetree-distanceproblemmotivatesanotherapproachthatconstructspiecewiselinearapproximationson
simplexsubdivisions.Inthefollowingdescriptionweleaveoutthetechnicaldetails,whichcanbefoundin[6].Webasethe
approximationonahierarchyofpartitionsgeneratedbydyadicsimplexsubdivision.Thatis,eachsimplexSinthepartition
P onlevellissubdividedinto2dsimplices(itschildren)onlevell 1.
l +
For this purpose we use a scheme described in [11] which is based on recursive binary subdivision. In each binary
subdivisionsteponeedgeofthecurrentsimplexischosenandsubdividedatitsmidpoint.Tobemorepreciseletusdenote
with(x ,x ,...,x ) asimplexthatarisesfromasimplexinthedyadictreebygbinarysubdivisions.Thenthissimplexis
0 1 d g
dividedintothetwosubsimplices
x x
x , 0+ d,x ,...,x ,x ,...,x
0 1 g g 1 d 1
2 + �
✓ ◆g 1
+
and
x x
x , 0+ d,x ,...,x ,x ,...,x
d 1 g d 1 g 1
2 � +
✓ ◆g 1
+
wherethesequences(x ,...,x )and(x ,...,x )shouldbereadasvoidforg d 1andg 0,respectively,and
thesequencex ,...,gx+1 featurde�s1decreasi1ngindicges. = � =
d 1 g 1
Givenx,wed�enotebyS+(x)thesimplexatlevellwhichcontainsxandgivenanysimplexS,wedenoteitssetofvertices
l
byV(S).Ifvisavertexofalevel-lsimplexinthemastertree,thenS(v)denotesthesetalllevel-lsimplicesthatsharevas
l
acornerpoint.WealsodefineV tobethesetofallverticesatlevellofoccupiedcells.
l
Inthetrainingstagewecomputethevalues
y(v) Average yi xi S(v)
l = { : 2 l }
foreachvertexofasimplexonlevellintheoccupancytree.Intheevaluationstage,ifwearegivenaninputx,wefirst
determinethemaximumlevellsuchthatV(S(x)) V andthenwecompute
l \ l 6=;
⌧(S(x),v,x)y(v)
l l
f˜(x)= v2V(SPl(x))\Vl ⌧(S(x),v,x) (8)
l
v2V(Sl(x))\Vl
where⌧(S,v,x)isthPebarycentricweightofthepointxwithrespecttothevertexvofthesimplexS.
Tosummarize:thevalueofthequerypointisfoundbyinterpolatingvertexvaluesofthesimplexcontainingit.Hence,
thequeryresponsebecomesanaverageofalltrainingpointsintheneighborhoodofthequerycoordinates,evenincluding
thepointsinsimplicesthatarefarawayintreedistance.Again,notethatthisalgorithmissuitedforincrementallearning:
ifonegetsanewsample,onejustcomputesitsplaceintheoccupancytreeandaddsitsvaluetoalladjacentvertices.Any
queryafterthatcanimmediatelyusethenewinformation.
4. Numericalexperiments
Inthissectionwepresenttheresultsofthreegroupsofnumericalexperiments.Inthefirsttwosubsections,thesetsof
trainingdataandtestdataeachcontain196,608datasamplescollectedduringareferenceclimatesimulationfortheyears
1961–1962usingtheoriginalparameterization.Inthefirstsubsectionwecomparetheregressiontreeswiththebenchmark
neuralnetworkapproximation.Notethatthiscomparisontendstooverestimatetheadvantagesoftheneuralnetwork.First
ofall,itdoesnotreflectthetrainingtime,whichisaboutaweekfortheneuralnetwork,butonlybetweenafewseconds
andlessthanafewhoursforthetree-basedmethods.Second,whereastheneuralnetworkwouldprofitonlyslightlyfrom
takingmoretrainingdata(itsaccuracyisbasicallylimitedbythenumberofneurons),thenon-parametricmethodsbenefits
significantlyfromallowingmoredata,andlimitingthetrainingdatasizeisartificialandunnecessary.Nevertheless,we
performthecomparisoninthisform,becauseitisthesametrainingdatawewillusefortheexperimentinSection5where
wehavetocomplywithmemorylimitations.Asitturnsout,nearestneighbormethodsandsparseoccupancytreesdo
notdelivercompetitiveaccuracy,ifappliednaïvely,buttheirperformancecanbeenhancedbydimensionreduction.We
demonstratethisinsomeexperimentspresentedinSection4.2.Finally,thelastexperimentpresentedinSection4.3gives
someinsightintotheinternalstructureofthedata.Here,wetrytoobtainanestimateoftheintrinsicdimensionoftheinput
data.