Table Of ContentAppearedin7thUSENIXSymposiumonNetwork
DesignandImplementation(NSDI’10)
Experiences with CoralCDN: A Five-Year Operational View
MichaelJ.Freedman
PrincetonUniversity
Abstract
convenience and availability. CoralCDN has since re-
mainedpubliclyavailableformorethanfiveyearsathun-
CoralCDN is a self-organizing web content distribution dredsofPlanetLabsitesworld-wide.Accountingforama-
network(CDN).PublishingthroughCoralCDNisassim- jorityofpublicPlanetLabtrafficandusers,CoralCDNtyp-
ple as making a small change to a URL’s hostname; a icallyservesseveralterabytesofdataperday,inresponse
decentralizedDNSlayertransparentlydirectsbrowsersto totensofmillionsofHTTPrequestsfromaroundtwomil-
nearbyparticipatingcachenodes,whichinturncooperate lionusers(uniqueclientIPaddresses).
tominimizeloadontheoriginwebserver. CoralCDNhas Over the course of its deployment, we have come to
been publicly available on PlanetLab since March 2004, acknowledge several realities. On a positive note, Coral-
accounting for the majority of its bandwidth and serving CDN’snotablysimpleinterfaceledtowidespreadandin-
requestsforseveralmillionusers(clientIPs)perday. This novative uses. Sites began using CoralCDN as an elas-
paper describes CoralCDN’s usage scenarios and a num- ticinfrastructure,dynamicallyredirectingtraffictoCoral-
berofexperiencesdrawnfromitsmulti-yeardeployment. CDNattimesofhighresourcecontentionandpullingback
Theselessonsrangefromthespecifictothegeneral,touch- astrafficlevelsabated. Ontheflipside,fundamentalparts
ing on the Web (APIs, naming, and security), CDNs (ro- of CoralCDN’s design were ill-suited for its deployment
bustnessandresourcemanagement),andvirtualizedhost- andthemajorityofitsuse. Ifoneweretoconsiderthevar-
ing(visibilityandcontrol). Weidentifydesignaspectsand iousreasonsforitsuse—forresurrectinglong-unavailable
changesthathelpedCoralCDNsucceed,yetalsothosethat sites,supportingrandomsurfing,distributingpopularcon-
provedwrongforitscurrentenvironment. tent, and mitigating flash crowds—CoralCDN’s design is
insufficient for the first, unnecessary for the second, and
1 Introduction overkillforthethird,atleastgivenitscurrentdeployment.
But diverse and unanticipated use is unavoidable for an
The goal of CoralCDN was to make desired web content opensystem,yetopennessisanecessarydesignchoicefor
available to everybody, regardless of the publisher’s own handlingthefinalflash-crowdscenario.
resources or dedicated hosting services. To do so, Coral- This paper provides a retrospective of our experience
CDNprovidesanopen,self-organizingwebcontentdistri- buildingandoperatingCoralCDNoverthepastfiveyears.
bution network (CDN) that any publisher is free to use, Our purpose is threefold. First, after summarizing Coral-
without any prior registration, authorization, or special CDN’s published design [14] in Section §2, we present
configuration. PublishingthroughCoralCDNisassimple data collected over the system’s production deployment
asappendingasuffixtoaURL’shostname,e.g.,http:/ andconsideritsimplications. Second,wediscussvarious
/example.com.nyud.net/. ThisURLmodification deployment challenges we encountered and describe our
maybedonebyclients,originservers,orthirdpartiesthat preferred solutions. Some of these changes we have im-
link to these domains. Clients accessing such Coralized plementedandincorporatedintoCoralCDN;othersrequire
URLs are transparently directed by CoralCDN’s network adoption by third-parties. Third, given these insights, we
of DNS servers to nearby participating proxies. These revisittheproblemofbuildingasecure,open,andscalable
proxies,inturn,coordinatetoservecontentandthusmin- contentdistributionnetwork. Morespecifically,thispaper
imizeloadonoriginservers. addressesthefollowingtopics:
CoralCDN was designed to automatically and scalably
• ThesuccessofCoralCDN’sdesigngivenobservedus-
handle sudden spikes in traffic for new content [14]. It
agepatterns(§3). Ourverdictismixed: Alargema-
canefficientlydiscovercachedcontentanywhereinitsnet-
jority of its traffic does not require any cooperative
work, and it dynamically replicates content in proportion
caching at all, yet its handling of flash crowds relies
toitspopularity. Bothtechniqueshelpminimizeoriginre-
onsuchcooperation.
questsandsatisfychangingtrafficdemands.
Whileoriginallydesignedfordecentralizedandunman- • Web security implications of CoralCDN’s open API
aged settings, CoralCDN was deployed on the PlanetLab (§4). Through its open API, sites began leveraging
research network [27] in March 2004, given PlanetLab’s CoralCDN as an elastic resource for content distri-
1
bution. Yet this very openness exposed a number of Client
Resolver
web security challenges. Many can be attributed to 22
1 5
alackofexplicitnessforspecifyingappropriatepro-
tection domains, and they arise due to violations of CoralCDN CoralCDN
CoralCDN CoralCDN
HTTP Proxy DNS Server
traditionalsecurityprinciples(suchasleastprivilege, HTTP Proxy DNS Server
completemediation,andfail-safedefaults[33]). Coral index node Coral index node
33 66
• Resource management in CDNs (§5). CoralCDN 4
commonly faced the challenge of interacting with CoralCDN HCToTrPa lPCrDoNxy DCNoSra SleCrDvNer
CoralCDN
oversubscribed and ill-behaved resources, both re- Coral index node HTTP Proxy
Coral index node
moteoriginserversanditsowndeploymentplatform.
Various aspects of its design react conservatively to Figure1: ThestepsinvolvedinservingaCoralizedURL.
changeandperformadmissioncontrolforresources.
2.1 Systemoverview
• Desired properties for deployment platforms (§6).
Application deployments could benefit from greater At a high level, the following steps occur when a client
visibility into and control over lower layers of their issuesarequesttoCoralCDN,asshowninFigure1.
platforms. Some challenges are again confounded 1. Resolving DNS. A client resolves a “Coralized”
when information and policies cannot be expressed domain name (e.g., of the form example.com.
explicitlybetweenlayers. nyud.net)usingCoralCDNnameservers.ACoral-
CDN nameserver probes the client to determine its
• Directions for building large-scale, cooperative
round-trip-time and uses this information to deter-
CDNs (§7). While using decentralized algo-
mineappropriatenameserversandproxiestoreturn.
rithms, CoralCDN currently operates on a centrally-
administered,smaller-scaletestbedoftrustedservers. 2. Processing HTTP client requests. The client sends
Werevisitthechallengeofescapingthissetting. an HTTP request for a Coralized URL to one of the
returnedproxies. Iftheproxyiscachingthewebob-
RatherthanfocusonCoralCDN’sself-organizingalgo- jectlocally, itreturnstheobjectandtheclientisfin-
rithms,themajorityofthispaperanalyzesCoralCDNasan ished. Otherwise, the proxy attempts to find the ob-
exampleofanopenwebserviceonavirtualizedplatform. jectonanotherCoralCDNproxy.
Assuch,theexperienceswedetailmayhaveimplications
3. Discoveringcooperative-cachedcontent. Theproxy
toawideraudience,includingthosedevelopingdistributed
looksuptheobject’sURLintheCoralindexinglayer.
hash tables (DHTs) for key-value storage, CDNs or web
4. Retrievingcontent. IfCoralreturnstheaddressofa
services for elastic provisioning, virtualized network fa-
nodecachingtheobject,theproxyfetchestheobject
cilities for programmable networks, or cloud computing
fromthisnode. Otherwise, theproxydownloadsthe
platforms for virtualized hosting. While many of the ob-
objectfromtheoriginserverexample.com.
servationswereportareneithernewnorsurprisinginhind-
sight,manyrelatetomistakes,oversights,orlimitationsof 5. Servingcontenttoclients. Theproxystorestheweb
CoralCDN’soriginaldesignthatonlybecameapparentto objecttodiskandreturnsittotheclientbrowser.
usfromitsdeployment. 6. Announcingcachedcontent. Theproxystoresaref-
WenextreviewCoralCDN’sarchitectureandprotocols; erencetoitselfinCoral,recordingthefactthatisnow
amorecompletedescriptioncanbefoundin[14]. Allsys- cachingtheURL.
temdetailspresentedafter§2weredevelopedsubsequent
ThissectionreviewsthedesignoftheCoralindexinglayer
to that publication. We discuss related work throughout
andtheCDN’sproxies,asproposedin[14].
thepaperaswetouchondifferentaspectsofCoralCDN.
2.2 Coralindexinglayer
2 Original CoralCDN Design
TheCoralindexinglayeriscloselyrelatedtothestructure
andorganizationofdistributedhashtableslikeChord[34]
The Coral Content Distribution Network is composed of
andKademlia[23],withthelatterservingasthebasisfor
threemainparts:(1)anetworkofcooperativeHTTPprox-
its underlying algorithm. The system maps opaque keys
ies that handle client requests from users, (2) a network
ontonodesbyhashingtheirvalueontoaflat,semantic-free
of DNS nameservers for nyud.net that map clients to
identifier (ID) space; nodes are assigned identifiers in the
nearby CoralCDN HTTP proxies, and (3) the underlying
sameIDspace.Itallowsscalablekeylookup(inO(log(n))
Coralindexinginfrastructureandclusteringmachineryon
overlay hops for n-node systems), reorganizes itself upon
whichthefirsttwoapplicationsarebuilt. Thispapercon-
networkmembershipchanges,andprovidesrobustbehav-
sistentlyreferstothesystem’sindexinglayerasCoral,and
ioragainstfailure.
theentirecontentdistributionsystemasCoralCDN.
2
Compared to “traditional” DHTs, Coral introduced a
22 33
few novel techniques that were well-suited for its partic- 11
ular application [13]. Its key-value indexing layer was
4
designed with weaker consistency requirements in mind,
and its lookup structure self-organized into a locality-
optimizedhierarchyofclustersofpeers. Afterall,aclient 000… Distance to key 111…
need not discover all proxies caching a particular file, it 4 Thresholds
only needs to find several such proxies, preferably ones
None
nearby. LikemostDHTs,Coralexposesputandgetoper-
3
ations,toannounceone’saddressascachingawebobject,
< 80 ms
andtodiscoverotherproxiescachingtheobjectassociated
2
withaparticularURL,respectively. Insertedaddressesare 1
< 30 ms
soft-statemappingswithatime-to-live(TTL)value.
Figure 2: Coral’sthree-levelhierarchicaloverlaystructure. Anode
Coral’s put and get operations are designed to spread
first queries others in its level-2 cluster (the dotted rings), where
load,bothwithintheDHTandacrossCoralCDNproxies.
pointersreferenceothercachingproxieswithinthesamecluster.Ifa
Togettheproxyaddressesassociatedwithakeyk,anode
nodefindsamappinginitslocalcluster(afterstep2),itsgetfinishes.
traverses the ID space with iterative RPCs, and it stops
Otherwise,itcontinuesamongitslevel-1cluster(thesolidrings),and
upon finding any remote peer storing values for k. This
finally,ifneeded,toanynodewithinthegloballevel-0system.
peer need not be the one closest to k (in terms of DHT
identifier space distance). To put a key/value pair, Coral
originserveraftertheCoralindexinglayerprovidesnore-
routes to nodes successively closer to k and stops when
ferralsornoneofitsreferralsreturnthedata.
finding either (1) the nodes closest to k or (2) one that is
CoralCDN’s inter-proxy transfers are optimized for lo-
experiencinghighrequestratesforkandalreadyiscaching
cality,bothfromtheiruseofparallelconnectionstoother
severalcorrespondingvalues(withlonger-livedTTLs). It
proxiesandbytheorderinwhichneighboringproxiesare
stores the pair at the node closest to k that it managed to
contacted. The properties of Coral’s hierarchical index-
reach. TheseprocessespreventtreesaturationintheDHT.
ingensuresthatthelistofproxiesreturnedbygetwillbe
To improve locality, these routing operations are not
sortedbasedontheirclusterdistancetotherequestinitia-
initially performed across the entire global overlay. In-
tor.Thus,proxieswillattempttocontactlevel-2neighbors
stead, each Coral node belongs to several distinct routing
beforelevel-1andlevel-0proxies,respectively.
structurescalledclusters. Eachclusterischaracterizedby
a maximum desired network round-trip-time (RTT). The 2.3.2 Rapidadaptationtoflashcrowds
system is parameterized by a fixed hierarchy of clusters
with different expected RTT thresholds. Coral’s deploy- Unlike many web proxies, CoralCDN is explicitly de-
mentusesathree-levelhierarchy,withlevel-0denotingthe signedforflash-crowdscenarios.Ifaflashcrowdsuddenly
global cluster and level-2 the most local one. Coral em- arrivesforawebobject,proxiesself-organizeintoaform
ploysdistributedalgorithmstoformlocalized,stableclus- of multicast tree for retrieving the object. Data streams
ters,whichwebrieflyreturntoin§5.3. from the proxies that started to fetch the object from the
Every node belongs to one cluster at each level, as in originservertothosearrivinglater. Thislimitsconcurrent
Figure2. Coralqueriesnodesinfastclustersbeforethose objectrequeststotheoriginserveruponaflashcrowd.
in slower clusters. This both reduces lookup latency and CoralCDNprovidessuchbehaviorbycut-throughrout-
increases the chance of returning values stored at nearby ing and optimistic references. First, CoralCDN’s use of
nodes,whichcorrespondtoaddressesofnearbyproxies. cut-through routing at each proxy helps reduce transmis-
siontimeforlargerfiles. Thatis,aproxywilluploadpor-
2.3 TheCoralCDNHTTPproxy tionsofaobjectassoonastheyaredownloaded,notwait-
inguntilitreceivestheentireobject. Second,proxiesopti-
CoralCDN seeks to aggressively minimize load on origin
misticallyannouncethemselvesassourcesofcontent. As
servers.ThissectionsummarizeshowitsproxiesuseCoral
soonasaCoralCDNproxybeginsreceivingthefirstbytes
forinter-proxycooperationandadaptationtoflashcrowds.
ofawebobject—eitherfromtheoriginoranotherproxy—
it inserts a reference to itself into Coral with a short TTL
2.3.1 Locality-optimizedinter-proxytransfers
(30 seconds). It continually renews this short-lived refer-
Each CoralCDN proxy keeps a local cache from which it
enceuntileitheritcompletesthedownload(atwhichtime
can immediately fulfill client requests. When a client re- itinsertsalonger-livedreference1)orthedownloadfails.
questsanon-residentURL,CoralCDNproxiesattemptto
1Thedeployedsystemuses2-hourTTLsforsuccessfulresults(status
fetchwebcontentfromeachother,usingtheCoralindex-
codesof200,301,302,etc.),and15-minuteTTLsfor403,404,andother
ing layer for discovery. A proxy only contacts a URL’s unsuccessful,non-transientresults.
3
ns) 100 FTroo Ump Cstlrieenatms Proxy/Origin Year dUomniaqiunes UUniRqLues w%ithU1RrLeqs pRoepquslatorUmRosLt
o
Milli 2005 7881 577K 54% 697K
y ( 10 2007 21555 588K 59% 410K
a
er D 2009 20680 1787K 77% 1578K
p
s
est 1 Figure5: CoralCDNtrafficstatisticsforanarbitraryday(Aug9).
u
q
e
R
0.1 depth. Figure 3 plots the total number of HTTP requests
Jan’05 Jan’06 Jan’07 Jan’08 Jan’09 Jan’10
thatthesystemreceivedeachdayfrommid-2004through
Figure3:TotalHTTPrequestsperdayduringCoralCDN’sdeploy-
early 2010, showing both the number of HTTP requests
ment.Grayedregionscorrespondtomissingorincompletedata.
from clients, as well as the number of requests issued to
#IPs per Day (Millions) 01 ..01255 2005 2007 2009 Data Sent per Day (GB) 11223 505050000000 0000000 2005 2007 2009 ucmwdopeiempstWnhttrmtheem,baoeeoemxnartacwemrCheeroeiqecnrcunoaeeenldn5sCstttiDhsarrtarnaNietndteeegpss2teoio0meffforeHms4r0poTiml–erTlr5uioioP0cornihdtgmrsHiaonifflfTfirlsioTcCoimtnPeoosvrtd.raheaelTerCiqslhtuyeDheeelrNstoerstgq’saassucmpeedienessertpsnmshd.li2onooayerwye--,
Figure 4: CoralCDNusage: numberofuniqueclients(left)and dayperiod(August9–18)in2005,2007,and2009. Coral-
uploadvolume(right)foreachdayduringAugust9–18. CDNreceived15–25Mrequestsduringeachdayofthese
periods.Figure4plotsthetotalnumberofuniqueclientIP
addresses from which these requests originated (left) and
2.4 Implementationanddeployment
theaggregateamountofbandwidthuploaded(right). The
CoralCDNiscomposedofthreestand-aloneapplications. traces showed 1–2 million clients per day, resulting in a
TheCoraldaemonprovidesthedistributedindexinglayer, fewterabytesofcontenttransferred. Wewillprimarilyuse
accessedoverUNIXdomainsocketsfromasimpleclient the2009trace,consistingof209Mrequests,inlateranal-
librarylinkedintoapplicationssuchasCoralCDN’sHTTP ysis. Figure5providesmoreinformationaboutthetraffic
proxyandDNSserver. Allthreearewrittenfromscratch. patterns,focusingonthefirstdayofeachtrace.
Coral network communication uses Sun RPC over UDP, Figure 6 plots the distribution of requests per unique
while CoralCDN proxies transfer content via standard URL.WeseethatthenumberofrequestsperURLfollows
HTTP connections. At initial publication [14], the Coral a Zipf-like distribution, as common among web caching
daemon was about 14,000 lines of C++, the DNS server andproxynetworks[5]. CertainURLsareverypopular—
2,000 LOC, and the proxy 4,000 LOC. CoralCDN’s im- theso-called“head”ofthedistribution—suchasthemost
plementationhassincegrowntoaround50,000LOC.The popular one in the Aug-9-2009 trace, which received al-
changeswelaterdiscusshelpaccountforthisincrease. most1.6Mrequestsitself. AlargenumberofURLs—the
CoralCDNtypicallyrunson300–400PlanetLabservers distribution’s“heavytail”—receiveonlyasinglerequest.
(about 70–100 of which run its DNS server), spread over
The datasets also show stability in the most popular
100-200 sites worldwide. It avoids Internet2-only and
URLs and domains over time. In all three datasets, the
commercialsites,thelatterduetopolicydecisionsthatre-
most popular URL retained that ranking across all nine
stricttheiruseforopenservices. CoralCDNusesnospe-
days. In fact, this URL in the 2007 and 2009 traces be-
cialknowledgeofthesemachines’locationsorconnectiv-
longedtothesamedomain: asitethatusesCoralCDNto
ity(e.g.,GPScoordinates,routinginformation,etc.).Even
distributerule-setupdatesforthepopularFirefoxAdBlock
though CoralCDN runs on a centrally-managed testbed,
browser extension. Exploring this further, Figure 7 uses
its mechanisms remain decentralized and self-organizing.
the2009tracetoplottherequestrateperdayforthemost
The only use of centralization is for managing software
populardomains(takingtheunionofeachday’smostpop-
andconfigurationupdatesandforcontrollingrunstatus.
ularfivedomainsresultedinnineuniquedomains).Wesee
that six of the nine domains had stable traffic patterns—
3 Analyzing CoralCDN’s Usage
theywerelong-termCoralCDN“customers”—whilethree
varied between two and six orders of magnitude per day.
This section presents some HTTP-level data from Coral-
The traffic patterns that we see in these two figures have
CDN’sdeploymentandconsidersitsimplications.
designimplications,whichwediscussnext.
3.1 Systemtracesandtrafficpatterns
2Thepeakof120MrequestsonAugust21, 2008correspondstoa
To understand some of the HTTP traffic patterns that
short-livedexperimentofanacademicresearchprojectusingCoralCDN
CoralCDNsees,weanalyzedseveraldatasetsinincreasing asakey-valuestore[15].
4
1e+06 TopURLs TotalSize(MB) %ofTotalReqs
URL 100000 AAuugg--99--22000057 0.01% 14 49.1%
er 10000 Aug-9-2009 0.1% 157 71.8%
p
ests 1 100000 1% 3744 84.8%
equ 10 10% 28734 92.2%
R
1 1 10 100 1000 10000 100000 1e+06 Figure8: CoralCDN’sworkingsetsizeforitsmostpopularURLs
Unique URLs by Popularity on Aug 9, 2009: A small percentage of URLs account for a large
Figure6: TotalrequestsperuniqueURL. fractionofrequests,yettheyrequirerelativelylittlestoragetocache.
n 1e+07
mai 1e+06 Insufficientforresurrectingoldcontent. CoralCDNis
er do 1 1000000000 not designed for archival storage. Proxies do not proac-
s p 1000 tively replicate content for durability, and unpopular con-
quest 1 1000 tent is evicted from proxy caches over time. Further, if
Re 1 content has an expiry time (default is 12 hours), a proxy
1 2 3 4 5 6 7 8 9
will serve expired content for at most 24 hours after the
Time (Days)
origin fails. Still, some clients attempt to use Coral-
Figure7: Requestspertop-5domainovertime(Aug9-18,2009).
CDN for this purpose. This underscores a design trade-
3.2 Implicationsofusagescenarios off: Instressingsupportforflashcrowdsratherthanlong-
termdurability,CoralCDNdevotesitsresourcestoprovide
ForCoralCDNtohelpunder-provisionedwebsitessurvive
availability for content being actively requested. On the
unexpectedtrafficspikes,itdoesnotrequireanypriorreg-
otherhand,byservingexpiredcontentforalimiteddura-
istrationorauthorization. Yetwhilesuchopennessisnec-
tion,CoralCDNcanmaskthetemporaryunavailabilityof
essarytoenableevenunmanagedwebsitestosurviveflash
anorigin,atleastforcontentalreadycachedinitsnetwork.
crowds,itcomesatacost: CoralCDNisusedinavariety
ofwaysthatdifferfromthismorenarrowgoal. Thissec- Unnecessary for unpopular content. While proxies
tion considers how well CoralCDN’s design is suited for candiscoverevenrarecachedcontent,CoralCDNdoesnot
itsfourmainusagescenarios: provideanybenefitbyservingsuchunpopularcontent: It
does not reduce servers’ load meaningfully, and it often
1. Resurrectingoldcontent: Anecdotally,someclients
results in higher client latency. As such, clients that use
attempt to use CoralCDN for long-term durability.
CoralCDNtoavoidlocalfiltering,circumventgeographic
One can download browser plugins that link to both
restrictions,orprovide(minimal)anonymitymaybebetter
CoralCDN and archive.org as potential sources
servedbystandardopenproxies(thatvanillabrowserscan
ofcontentwhenoriginserversareunavailable.
beconfiguredtouse)orthroughspecializedtoolssuchas
2. Accessing unpopular content: CoralCDN’s request Tor[12]. Yet,thistypeofusagepersists—thelongtailof
distribution shows a heavy tail of unpopular URLs. Figure6—andCoralCDNmightthenbebetterservedwith
ServersmayCoralizeURLsthatfewvisit. Andsome adifferentdesignforsuchtraffic,i.e.,onethatdoesn’tre-
clients use CoralCDN as a more traditional proxy, quireamulti-hop,wide-areaDHTlookuptocompletebe-
for(presumed)anonymity,censorshiporfilteringcir- forefetchingcontentfromtheorigin. Forexample,forits
cumvention[32],orautomatedcrawling. modest deployment on PlanetLab, each Coral node could
maintainconnectivitytoallothersandsimplyuseconsis-
3. Serving long-term popular content: Most requests
tent hashing for a global, one-hop DHT [17, 37]. Alter-
areforasmallsetofpopularobjects. Theseobjects,
natively, Coral could only maintain connections with re-
alreadywidelycachedacrossthenetwork, belongto
gional peers and eschew global lookups, a design which
thestablesetofcustomerdomainsthateffectivelyuse
weevaluatefurtherin§7.
CoralCDNasafree,long-termCDNprovider.
4. Surviving flash crowds to content: Finally, Coral- Overkill for stably popular content, so far. For most
CDNisusedforitsstatedgoalofenablingunderpro- ofCoralCDN’straffic,cooperationisnotneeded: Figure6
visioned websites to withstand transient load spikes. shows that a small number of URLs accounts for a large
PopularportalsregularlylinktoCoralizedURLs,and fraction of requests. We now measure their working set
userspostlinksincomments. Somesitesevenadopt sizeinFigure8,inordertodeterminehowmuchstorageis
dynamic and programmatic mechanisms to redirect requiredtohandlethistraffic. Wefindthatthemostpopu-
requests to CoralCDN, based on observed load and lar0.01%ofURLsaccountformorethan49%ofthetotal
requestreferrers. Wediscussthisfurtherin§4.1. requeststoCoralCDN,yetrequireonly14MBofstorage.
Each proxy has a 3.0 GB disk cache, managed using an
Unfortunately, CoralCDN’s design is not well-suited for LRU eviction policy. This is sufficient for serving nearly
thefirstthreeusecases. 85%ofallrequestsfromlocalcache.
5
18000
70.4%hitinlocalcache slashdot.org referer
16000 10-May-05
12.6%returned4xxor5xxerrorcode
14000 09:15 EST 11:56
97..91%%ffeettcchheeddffrroommoorthigeirnCsoitrealCDNproxy minute 12000
|→1.7% fromlevel-0cluster(global) er 10000
p
|→1.9% fromlevel-1cluster(regional) sts 8000
e
|→3.6% fromlevel-2cluster(local) qu 6000
e
R 4000
Figure9: CoralCDNaccessratiosforcontentduringAug9,2009. 2000
0
These workload distributions support one aspect of -30 0 30 60 90 120 150 180
CoralCDN’s design: Content should be locally cached Time (minutes)
by the “forward” CoralCDN proxy directly serving end- Figure10: FlashcrowdtoaCoralizedURLlinkedtobySlashdot.
clients, given that small to moderate size caches in these
proxies can serve a very large fraction of requests. This 600 moonbuggy.org
redditmirror.cc moonbuggy reddit
differs from the traditional DHT approach of just storing 500 1000
e
data on a small number of globally-selected proxies, so- ut
called“serversurrogates”[8,37]. er min 400 100
IfCoralCDN’sworkingsetcanbefullycachedbyeach s p 300 10
st
node,weshouldunderstandhowmuchcooperationisac- que 200 1
e
tually needed. Figure 9 summarizes the extent to which R
proxies cooperate when handling requests. 70% of re- 100
queststoproxiesaresatisfiedlocally,whileonly7%result
0
incooperativetransfers. (Thehighrateoferrormessages 0 24 48 72 96 120 144 168
Time (Hours)
isduetoadmissioncontrolasameansofbandwidthman-
Figure11: Mini-flashcrowdsduringAugust2009trace.Eachdat-
agement,whichwediscussin§5.2.)Inshort,atleastforits
apointrepresentsaone-minuteduration;embeddedsubfiguresshow
currentworkloadandenvironment,onlyasmallfractionof
requestratesforthetensofminutesaroundtheonsetofflashcrowds.
CoralCDN’strafficusesitscooperationmechanisms.
Arelatedresultaboutthelimitsofcooperativecaching
long leading edge before traffic rises by several orders of
hadbeenobservedearlier[38],butfromtheperspectiveof
magnitude,whichhasinterestingimplications.
limitedimprovementsinclient-sidehitrates. Thisisasig-
Figures 10 and 11 show the request patterns of several
nificantlydifferentgoalfromreducingserver-siderequest
flashcrowdsthatCoralCDNexperienced. Theformerwas
rates, however: Non-cooperating groups of nodes would
toasitelinkedtoinaSlashdotarticleinMay2005. After
eachindividuallyrequestcontentfromtheorigin.
rising,theSlashdotflashcrowdlastedlessthanthreehours
This design trade-off comes down to the question of
in duration and came to an abrupt conclusion (perhaps as
how much traffic is too much for origin servers. For
thestorydroppedoffthewebsite’smainpage). Thelatter,
moderately-provisioned origins, such as the customers of
covering our August 2009 trace, shows spikes to the im-
commercial CDNs, a caching system might only rely on
agecacheofalesspopularportal(moonbuggy.org),as
local or regional cooperation. In fact, Akamai’s network
wellastoawell-publicizedmirrorforthecollaboratively-
is designed precisely so: Nodes within each of its ap-
filtered reddit.com, with another attenuated spike 24
proximately1000clusterscooperate,buteachclustertypi-
hourslater. TheembeddedgraphsinFigure11depictthe
callyfetchescontentindependentlyfromoriginsites[22].
requestratesaroundtheonsetofthetrafficspikeforanar-
To replicate such scenarios, Coral’s clustering algorithms
rowerrangeoftime. Allthreeflashcrowdsshowthatthe
could be used to self-organize a network into local or re-
initialrisetookminutes.
gionalclusters. Itcouldthusavoidthemanualconfigura-
Foramorequantitativeanalysisofthefrequencyofflash
tionofHarvest[7]orcolocateddeploymentsofAkamai.
crowds, we examined the prevalence of domains that ex-
Ontheotherhand, whilecooperationisnotneededfor
perience a large increase in their request rates from one
most traffic, CoralCDN’s ability to react quickly to flash
time period to the next. In particular, Figure 12 consid-
crowds—tooffloadtrafficfromafailingoroversubscribed
ers all five-second periods across the August 2009 ten-
origin—ispreciselythescenarioforwhichitwasdesigned
day trace. The left graph plots a complementary cumu-
(andcommercialCDNsarenot). Weconsiderthesenext.
lative distribution function (CCDF) of the percentage of
Usefulformitigatingflashcrowds. CoralCDN’straces domainsrequestedineachperiodthatexperiencea10-or
regularly show spikes in requests to different URLs. We 100-fold rate increase. The right graph plots the percent-
find, however, that these flash crowds grow in popularity age of requests accounted for by these domains that ex-
ontheorderofminutes,notseconds.Thereisasufficiently perience orders-of-magnitude (OOM) increases. Sudden
6
Percentage of Epochs 0 .01 0 0.1000 .100111 ≥≥ 512s OO eOOpMMochs Percentage of Epochs 0 .01 0 0.1000 .100111 ≥≥ 512s OO eOOpMMochs owcraofitutIeChnniiotnnssrchafsroolehCrraotD,sarettsNshmti’ivssmaeldredlyaofptrrmaaeacrrasietihoinlooydsnwso,eoscxf(ctp2tuhhe)rarettiohetn(ono1tcst)aheeloerdnleoaolqrrymdgueeeaarsistrnosamsf,t’eaasnletlricdnafofc(rfin3aredcc)astaisa.onecyns-
1 10 1 10 100
% Domains that changed by OOM % Requests that changed by OOM Thismoderateadoptionrateavoidstheneedtointroduce
Figure 12: CCDF of extent of flash-crowd dynamics in August evenmoreaggressivecontentdiscoveryalgorithms. Sim-
2009trace.Leftgraphshowspercentageofdomainsexperiencingor- ulated workloads in early experiments (Figure 4 of [14])
dersofmagnitude(OOM)changesinrequestratesacrossfive-second showedthatunderhighconcurrency,CoralCDNmightis-
epochs.Rightshows%requestsforwhichthesedomainsaccount. sue several redundant fetches to an origin server due to
a race-like condition in its lookup protocol. If multiple
Percentage of Epochs 1 24680 000000 ≥≥3 210 sOO eOOpMMochs Percentage of Epochs 1 24680 000000 ≥≥≥1 3210 mOOO OOOepMMMochs ninbsooytddimneesstohccseoatnniancpcduopernlxriet,canaactltltliyotchngoesentcowturhihrgerieicsnnha.tmTuloeshoekikseauyrpadwcsiehcstaicrcnoihbnfudadtioieltediaosnnhndaoimssthysuheltttaairebpexlldee-
0.01 0.1 1 10 100 0.01 0.1 1 10 100 (both peer-to-peer and datacenter services). But because
% Requests that changed by OOM % Requests that changed by OOM
thesetracesshowthatthearrivalofuserrequestshappens
Epochs 1 8000 Epochs 1 8000 over a much longer time-scale than a DHT lookup, this
Percentage of 246 0000 ≥≥≥≥ 4321 6OOOOhOOOO eMMMMpochs Percentage of 246 0000 ≥≥≥≥1 4321d OOOOeOOOOpoMMMMchs rdaecsNeigocntoeinntdghiataitoinnteidstwopeoosrsknsiobfitllepeotsosyemsateitsmiigganftoiefirtchPainlsatcnpoerntoLdbailtbeiomtnh..aWt shuiple-
0.01 0.1 1 10 100 0.01 0.1 1 10 100
ported cooperative caching [2]—meant to quickly dis-
% Requests that changed by OOM % Requests that changed by OOM
Figure 13: CDFsofpercentageofrequestsaccountedforbydo- tribute a file in preparation for a new experiment—we
soughttominimizeredundantfetchestothefileserver.We
mains experiencing order(s)-of-magnitude rate increases. Rate in-
extended Coral’s insert operation to provide return status
creasescomputedacrossepochsof30seconds(topleft),10minutes
information, like test-and-set in shared-memory systems.
(topright),sixhours(bottomleft),andoneday(bottomright). Plots
A single put+get both returns the first values it encoun-
startonthey-axiswithzerodomainshavingsuchanincrease,e.g.,
tered in the DHT, as well as inserts its own values at an
28%of30sepochshavenodomainswitha≥1OOMrateincrease.
appropriate location (for a new key, this would be at its
closest node). This optimization comes at a subtle cost,
increasesdoexist,buttheyarerare. In76%of5sepochs,
however, asitnowoptimisticallyinsertsanode’sidentity
nodomainsexperiencedany10-foldincrease,whilein1%
evenbeforethatproxybeginsdownloadingthefile! Ifthe
ofepochs, 1.7%ofdomains(accountingfor12.9%ofre-
originfetchfails—agreaterpossibilityinCoralCDN’sen-
quests) increased by one order-of-magnitude. Larger dy-
vironmentthanwithamanagedfileserver—thentheuseof
namismwasevenmorerare:onlyin0.006%ofepochsdid
theseindexentriesdegradesperformance. Thus, afterus-
there exist a domain that experienced a 100-fold increase
ingthisput+getprotocolinCoralCDNforseveralmonths
inrequestrate. NothreeOOMincreaseoccurred.
during2005,wediscontinueditsuse.
To further understand the precipitousness of “flash”
crowds, Figure 13 extends this analysis across longer du-
CoralCDN’sopennesspermitsuserstoquicklyleverage
rations.3 Among30sepochs,50%ofepochshaveatmost
its resources under load, and its more complex coordina-
0.4% of domains experience a 10-fold increase in their
tionhelpsmitigatetheseflashcrowdsandmasktemporary
rates (not shown), which account for a total of 1.0% of
serverunavailability. Yetthisveryopennessledtovaried
requests (top left). Only 0.29% of 30s epochs have any
usage,themajorityofwhichdoesnotrequireCoralCDN’s
domains with more than a 100-fold rate increase. At 10-
morecomplexdesign. Aswewillsee, thisopennessalso
minute epochs, 28% of epochs have at least one domain
introducesotherproblems.
that experiences a two OOM rate increase, while 0.21%
have a domain with a three OOM increase. Still, these
4 Lessons for the Web
flashcrowdsaccountforasmallfractionoftotalrequests:
Domainsexperiencing100-foldincreasesaccountedforat
CoralCDN’s naming technique provides an open API for
least1%ofallrequestsinonly3.8%of10mepochs, and
CDN services that can transparently work for almost any
10%ofrequestsin0.05%ofepochs.
website. Over the course of its deployment, clients and
servershaveusedthisAPItoadoptCoralCDNasanelas-
3Toavoidovercountingunpopulardomains,wedonotcountchanges ticresourceforcontentdistribution. Throughcompletely
whentheabsolutenumberofrequeststoadomaininagiventimeperiod
automatedmeans,workcanbedynamicallyexpandedout
islessthansomeminimumamount,i.e.,10requestsfor5s,30s,and10m
periods,and100requestsfor6hand1dperiods. to use CoralCDN when websites require additional band-
7
widthresources,anditcanbecontractedbackwhenflash An API for dynamic adoption. CoralCDN was envi-
crowds abate. In doing so, its use presaged the notion of sioned with manual URL manipulation in mind, whether
“surgecomputing”withpubliccloudplatforms. Butthese by publishers editing HTML, users typing Coralized
namingtechniquesandCoralCDN’sopendesignintroduce URLs, or third-parties posting links. After deployment,
anumberofwebsecurityproblems,manyofwhichareen- however, users soon began treating CoralCDN’s interface
genderedbyalackofexplicitnessforspecifyingprotection asanAPIforaccessingCDNservices.
domains. Wediscusstheseissueshere. On the client side, these techniques included simple
browserextensionsthatoffer“right-click”optionstoCor-
4.1 AnAPIforelasticCDNservices
alizelinksorthatprovidealinkwhenapageappearsun-
We believe that the central reason for CoralCDN’s adop- available. They ranged to more complex integration into
tionhasbeenitssimpleuserinterfaceandopendesign. frameworks like Firefox’s Greasemonkey [21]. Grease-
monkeyallowsthird-partydeveloperstowritesite-specific
Interface design. While superficially obvious, Coral-
javascriptcodethat,onceinstalledbyusers,manipulatesa
CDN’sinterfacedesignachievesseveralimportantgoals:
site’sHTMLcontent(usuallythroughtheDOMinterface)
• Transparency: Workwithunmodified,unconfigured, whenever the user accesses it. Greasemonkey scripts for
andunawarewebclientsandwebservers. CoralCDN include those that automatically rewrite links
• Deep caching: Retrieve embedded images or links onpopularportals,ormodifyarticlestoincludetooltipsor
automaticallythroughCoralCDNwhenappropriate. additional links to Coralized URLs. CoralCDN also has
been integrated directly into a number of client-side soft-
• Servercontrol:Notinterferewithsites’abilitytoper-
warepackagesforpodcasting.
form usage logging or otherwise control how their
ThemoreinterestingcasesofCoralCDNintegrationare
contentisserved(e.g.,viaCoralCDNordirectly).
ontheserver-side. Onecommonstrategyisfortheorigin
• Ad-friendly: Not interfere with third-party advertis- toreceivetheinitialrequest, butrespondwitha302redi-
ing,analytics,orothertoolsincorporatedintoasite. recttoaCoralizedURL.Thiscanworkwellevenforflash
• Forwardcompatible: Beamenabletofutureend-to- crowds, astheoverheadofgeneratingredirectsismodest
endsecuritymechanismsforcontentintegrityorother comparedtothatofactuallyservingthecontent.
end-hostdeployedmechanisms. Generating such redirects can be done by installing a
serverpluginandwritingafewlinesofconfigurationcode.
Consider an alternative and even simpler interface de-
Forexample,thecompletedynamicredirectionruleusing
sign [11, 25, 29], in which one embeds origin URLs into
Apache’smod_rewritepluginisasfollows.
the HTTP path, e.g., http://nyud.net/example.
com/.NotonlyisHTTPparsingsimpler,butnameservers RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} !^CoralWebPrx
wouldnotneedtosynthesizeDNSrecordsonthefly(un-
RewriteCond %{QUERY_STRING} !(^|&)coral-no-serve$
like our DNS servers for *.nyud.net). Unfortunately, RewriteRule ^(.*)$ http://%{HTTP_HOST}.nyud.net
%{REQUEST_URI} [R,L]
whilethisinterfacecanbeusedtodistributeindividualob-
jects,itfailsonentirewebpages. Anyrelativelinkswould Still,redirectionrulesmustbecraftedcarefully. Inthis
lacktheexample.comprefixthataproxyneedstoiden- example, the second line checks whether the client is a
tify its origin. One alternative might be to try to rewrite CoralCDNproxyandthusshouldbeserveddirectly. Oth-
pages to add such links, although active content such as erwise,aredirectionlooppotentiallycouldbeformed(al-
javascript makes this notoriously difficult. Further, such though proxies prevent this from happening by checking
active rewriting impedes a site’s control over its content, forpotentialloopsandreturningerrorsifoneisfound).
anditcaninterferewithanalyticsandadvertisements. Amusingly, some early users during CoralCDN’s de-
CoralCDN’sapproach,however,interpretsrelativelinks ploymentcausedrecursioninadifferentway—andaform
with respect to a page’s Coralized hostname, and thus ofamplification attack—bysubmitting URLswitha long
transparently requests these objects through it as well. string of nyud.net’s appended to a domain. Before
But all absolute URLs continue to point to their origin proxies checked for such conditions, this single request
sites, andthird-partyadvertisementsandanalyticsremain caused a proxy to issue a number of requests, stripping
largely unaffected. Further, as CoralCDN does not mod- thelastinstanceofnyud.netoffineachiteration.
ify content, content also may be amenable to verification
While the above rewriting rule applies for all requests,
throughend-to-endcontentsignatures[30,35].
othersitesincorporateredirectioninmoreinventiveways,
In short, it was important for adoption that site owners
such as only redirecting clients arriving from particular
retainsufficientcontroloverhowtheircontentisdisplayed
high-trafficreferrers:
andaccessed. Infact,ourpredictedusagescenarioofsites
publishingCoralizedURLsprovedtobelesspopularthan RewriteCond %{HTTP_REFERER} slashdot\.org [NC,OR]
RewriteCond %{HTTP_REFERER} digg\.com [NC,OR]
thatofdynamicredirection(whichwedidnotforesee). RewriteCond %{HTTP_REFERER} blogspot\.com [NC]
8
And most interestingly, some sites have even combined whichwediscussfurtherin§5.2.Inadditiontomonitoring
suchtoolswithserverpluginsthatmonitorserverloadand server-sideconsumption,proxieskeepaslidingwindowof
bandwidthuse,sothattheirserversonlystartrewritingre- client-side usage. Not only do we seek to prevent exces-
questsunderhighloadconditions. sivebandwidthconsumptionbyclients,butalsoanexces-
WebsitesthereforeusedCoralCDN’snamingtechnique sive number of (even small) requests. These are caused
toleverageitsCDNresourcesinanelasticfashion. Based typically by server misconfigurations that result in HTTP
on feedback from users, we expanded this “API” to give redirection loops (per §4.1) or by “bot” misuse as part of
sitessomesimplecontroloverhowCoralCDNshouldhan- abrute-forceattack. WhileCoralCDN’slimitedfunction-
dle their requests. For example, webservers can include alitymitigatessuchattacks, onenotablebrute-forcelogin
X-Coral-Control response headers, which are saved attempttookadvantageofpoorsecurityatatop-5website,
ascachemeta-data,tospecifywhetherCoralCDNproxies whichusedcleartextpasswordsoverGETrequests.
should “redirect home” domains that exceed their band- Givenbothitsstorageandbandwidthlimitations,Coral-
widthlimits(per§5.2)orjustreturnanerrorasisstandard. CDN enforces a maximum file size of 50 MB. This
hasgenerallypreventedclientsfromusingCoralCDNfor
4.2 Securityandresourceprotection
videodistribution,apragmaticgoalwhendeployingprox-
Anumberofsecuritymechanismscurtailedthemisuseof ies on university-hosted PlanetLab servers. We found
CoralCDN.Wehighlightthedesignprincipleforeach. that sites attempted to circumvent these limits by omit-
tingContent-Lengthheaders(onconnectionsmarked
4.2.1 Limitingfunctionality as persistent and without chunked encoding). To ensure
compliance, proxies now monitor ongoing transfers and
CoralCDN proxies have only ever supported GET and
halt(andblacklist)anyonesthatexceedtheirlimits. This
HEAD requests. Many of the attacks for which “open”
skepticism is needed as proxies interact with potentially
proxiesareinfamous[24]aresimplynotfeasible. Forex-
untrustedservers,andthusmustenforcecompletemedia-
ample, clients cannot use CoralCDN to POST passwords
tion[33]totheirresources(inthiscase,bandwidth).
for brute-force cracking. Proxies do not support CON-
NECTrequests,andthustheycannotbeusedtosendspam
4.2.3 Blacklistingdomainsandoffloadingsecurity
asSMTPrelaysortoforge“From”addressesinwebmail.
Proxies do not support HTTPS and they delete all HTTP Wemaintainaglobalblacklistforblockingaccesstospec-
cookiessentinheaders. Theseproxiesthusprovidemini- ified origin domain names. Each proxy regularly fetches
malapplicationfunctionalityneededtoachievetheirgoals, and reloads the blacklist. This is a practical, but not fun-
whichiscooperativelyservingcacheablecontent. damental,necessity,employedtopreventCoralCDN’sde-
CoralCDN’s design had several unexpected conse- ploymentsitesfromrestrictingitsuse. Partiesthatrequest
quences. Perhaps most interestingly, given CoralCDN’s blacklistingtypicallyciteoneofthefollowingreasons.
multi-layer caching architecture, attempting to crawl or
Suspectedphishing. Websiteshavebeenconcernedthat
brute-force attack a website via CoralCDN is quite slow.
CoralCDNis—orwillbeconfusedwith—aphishingsite.
New or randomly-selected URLs first require a DHT
Afterall,bothappeartobe“scraping”contentandpublish
lookup to fail, which serves to delay requests against an
a simulacrum under an alternate domain. The difference,
originwebsite,inmuchthesamewaythatssh“tarpits”de-
of course, is that CoralCDN is serving the site’s content
layresponsestofailedloginattempts. Inaddition,because
unmodified,yettheweblacksanyprotocoltoauthenticate
CoralCDNonlyhandlesexplicitCoralizedURLs,itcannot
theintegrityofcontent(asinS-HTTP[30])inordertover-
be used by simply configuring a vanilla browser’s proxy
ifythis. AsSSLonlyauthenticatesidentity,websitesmust
settings. Further, CoralCDN cannot be used to anony-
typicallyincludeCDNsintheirtrustedcomputingbase.
mously launch attacks, as it eschews anonymity. Proxies
useuniqueUser-Agentstrings(“CoralWebPrx”)and Potential copyright violation. Typically following a
include their identity in Via headers, and they report an DMCA take-down notice, third-parties report that copy-
instigating client’s IP address to the origin server (in an rightedmaterialmaybefoundonaCoralizeddomainand
X-Forwarded-For request header). We can only sur- wantitblocked.ThisscenarioismitigatedbyCoralCDN’s
mise whether the combination of these properties played explicit naming—which preserves the name of the actual
some role, but CoralCDN has seen little abuse as a plat- origininquestion—andbyitscachingdesign. Oncecon-
formforproxyingserverattacks. tent is removed from an origin server, it is evicted auto-
matically from CoralCDN in at most 24 hours. This is a
4.2.2 Curtailingexcessiveresourceuse natural implication of its goal of handling flash crowds,
ratherthanprovidinglong-termavailability.
CoralCDN’s major limiting resource is aggregate band-
width. The system employs fair-sharing mechanisms to Circumventing access-control restrictions. Some do-
balancebandwidthconsumptionbetweenorigindomains, mainsmediateaccesstotheirwebsiteviaIP-basedauthen-
9
tication, whereby requests from particular IP prefixes are browserstate.Thispolicyappliestomanipulatingcookies,
grantedaccess.Thispracticeisespeciallycommonforon- browser windows, frames, and documents, as well as to
lineacademicjournals,inordertoprovideeasyaccessfor accessingotherURLsviaanXmlHttpRequest. Atitssim-
universitysubscribers.Butopenproxieswithinwhitelisted plestlevel,allofthesebehaviorsareonlyallowedbetween
prefixeswouldenableexternalclientstocircumventthese resources that belong to an identical origin domain. This
access-controlrestrictions. provides security against sites accessing each others’ pri-
By offloading policing to their customers, sites unnec- vateinformationkeptincookies,forexample. Italsopre-
essarily enlarge their security perimeter to include their vents websites that run advertisements (such as Google’s
customer’snetworks. Thisscenarioiscommonyetunnec- AdSense)fromeasilyperformingclickfraudtopaythem-
essary. Recall that CoralCDN proxies do not hide their selvesadvertisingdollarsbyprogrammatically“clicking”
identities, and they include the originating client’s IP ad- ontheirsite’sadvertisements.6
dressinstandardrequestheaders.Thus,originsitescanre- One caveat to the strict definition of an identical ori-
tainIP-basedauthenticationwhileverifyingthatarequest gin [18] is that it provides an exception for domains
does not originate from outside allowed prefixes.4 Sites that share the same domain.tld suffix, in that www.
arejustnotmakinguseofthisinformation,andthusfailto example.comcanreadandsetcookiesforexample.
properlymediateaccesstotheirprotectedresources.5 com. This has bad implications for CoralCDN’s naming
We did encounter some interesting attacks on our strategy. When example.com is accessed via Coral-
domain-based blacklists, akin to fast-flux networks. An CDN, it can manipulate all nyud.net cookies, not just
adversarycreateddynamicDNSrecordsforarandomdo- those restricted to example.com.nyud.net.7 Con-
mainthatpointedtotheIPaddressofatargetdomain(an cernedwiththepotentialprivacyviolationsfromthissce-
online academic journal). The random domain naturally nario,CoralCDNdeletesallcookiesfromheaders.
was not blacklisted by CoralCDN, and the content was Unfortunately,manywebsitesnowmanagecookiesvia
successfully downloaded from the origin target. Such a javascript, so cookie information can still “leak” between
circumventiontechniquewouldnothaveworkediftheori- Coralized domains on the browser. This happens of-
gin site checked either proxy headers (as above) or even ten without a site’s knowledge, as sites commonly use a
just the Host field of the HTTP request. The Host cor- URL’s domain.tld without verifying its name. Thus,
responded to the fast-flux attack domain, not that of the iftheCoralizedexample.comwritesnyud.netcook-
journal. Again, this security hole demonstrates a lack of ies, these will be sent to evil.com.nyud.net if the
explicitverificationandfail-safedefaults[33]. clientvisitsthatwebpage. HonestCoralCDNproxieswill
delete these cookies in transit, but attackers can still cir-
4.3 Securityandnamingconflation
cumvent this problem. For example, when a client vis-
We argued that CoralCDN’s naming provided a powerful itsevil.com.nyud.net,javascriptfromthatpagecan
API for accessing CDN services. Unfortunately, its tech- accessnyud.netcookies,thenissueaXmlHttpRequest
nique has serious implications as the Web’s Same Origin backtoevil.com.nyud.netwithcookieinformation
Policy(SOP)conflatesnamingwithsecurity. embeddedintheURL.Similarattacksarepossibleagainst
Browsersusedomainnamesforthreepurposes. (1)Do- other uses of the SOP, especially as it relates to the abil-
mains specify where to retrieve content after they are re- itytoaccessandmanipulatetheDOM.Notethattheseat-
solved to IP addresses, precisely how CoralCDN enacts tackvectorsexistevenwhileCoralCDNoperatesonfully-
its layer of indirection. (2) Domains provide a human- trustednodes,letalonemorepeer-to-peerenvironments!
readable name for what administrative entity a client is RatherthanconcludethatCoralCDN’sdomainmanipu-
interacting with (e.g., the “common name” identified in lationisfundamentallyflawed,wearguethatbetteradher-
SSLservercertificates).(3)Domainsspecifywhatsecurity encetosecurityprinciplesisneeded.Websitesarepartially
policiestoenforceonwebobjectsandtheirinteractions. atfaultbecausetheydefaultaccesstodomain.tldsuf-
The Same Origin Policy specifies how scripts and in- fixestooreadily,asopposedtostrippingtheminimalnum-
structions from an origin domain can access and modify berofdomainprefixes: aviolationoftheprincipleofleast
information. An alternative solution that embraces least
4Thisdoesnotaddressthecornercaseinwhichtheoriginalrequest
comesfromanIPaddresswithinthatprefix,whilesubsequentonesthat 6ThisispreventedbecauseadvertisementslikeAdSenseloadinan
accessthethen-cachedcontentdonot. Thiscanbehandledtypicallyby iframethattheparentdocument—thethird-partywebsitethatstandsto
markingcontentasnotcacheable,orbyhavingaproxyincludeheaders gainrevenue—cannotaccess,astheframebelongstoadifferentdomain.
thatexplicitlyspecifyitsclientpopulation(i.e.,as“open”orbyIPprefix). 7CommercialCDNslikeAkamaiaretypicallynotsusceptibletosuch
5One might argue that sites use a pure IP-based filtering approach attacks,astheygenerallyuseaseparatetop-leveldomainsforeachcus-
givenitsabilitytobeimplementedinlayer-3front-endloadbalancers. tomer,asopposedtoCoralCDN’ssuffix-basedapproach. UnlikeCoral-
Butthisisnotasimplefirewallproblem,assitesalsopermitaccessfor CDN’s zero configuration, however, such designs require that origins
individualusersthatloginwiththeappropriatecredentials.Thesiteswith preestablish an operational relationship with their CDN provider and
whichwecommunicatedimplementedsuchauthorizationlogiceitherdi- point their domain to the CDN service (e.g., by aliasing their domain
rectlyinwebserversorincomplex,layer-7front-endappliances. totheCDNthroughCNAMErecordsinDNS).
10
Description:Otherwise, it continues among its level-1 cluster (the solid rings), and finally, if needed, to any .. plots the percent- age of requests accounted for by these domains that ex- .. AdSense) from easily performing click fraud to pay them- . report human-readable errors in HTML body content, but with