Table Of Content1
Resilient Multicast using Overlays
SumanBanerjee, SeungjoonLee,BobbyBhattacharjee,AravindSrinivasan
DepartmentofComputerScience, UniversityofMaryland,CollegePark,MD20742,USA
Emails: (cid:0) suman,slee,bobby,srin @cs.umd.edu
(cid:1)
Abstract
We introducePRM (Probabilistic Resilient Multicast): a multicast data recoveryscheme that improvesdata de-
liveryratioswhilemaintaininglowend-to-endlatencies. PRMhasbothaproactiveandareactivecomponent;inthis
paperwedescribehowPRMcan beusedtoimprovetheperformanceofapplication-layermulticastprotocols,espe-
ciallywhentherearehighpacketlossesandhostfailures. Further,usinganalytictechniques,weshowthatPRMcan
guaranteearbitrarilyhighdatadeliveryratiosandlowlatencybounds.
Asadetailedcasestudy,weshowhowPRMcanbeappliedtotheNICEapplication-layermulticastprotocol.We
presentdetailedsimulationsofthePRM-enhancedNICEprotocolfor10,000nodeInternet-liketopologies.Simulations
showthatPRMachievesahighdeliveryratio( 97%)withalowlatencybound(600ms)forenvironmentswithhigh
(cid:2)
end-to-endnetwork losses (1-5%) and high topologychangerates (5 changes per second)while incurringvery low
overheads( 5%).
(cid:3)
I. INTRODUCTION
We present a fast multicast data recovery scheme that achieves high delivery ratios with low overheads. Our tech-
nique, called Probabilistic Resilient Multicast (PRM), is especially useful for applications that can bene(cid:2)t from low
datalosseswithoutrequiringperfectreliability. Examplesofsuchapplicationsarereal-timeaudioandvideostreaming
applications where the playback quality at the receivers improve if the delivery ratios can be increased within spe-
ci(cid:2)c latency bounds. Using terminology de(cid:2)ned in prior literature [20] we call this model of data delivery resilient
multicast.
InthispaperwedescribePRMinthecontextofoverlay-based multicast. Thebasicideaofmulticastusingoverlays
(also known as application-layer multicast) [6], [8], [2], [24], [5], [19], [11] is shown in Figure 1. Unlike native
multicast where data packets are replicated at routers inside the network, in application-layer multicast data packets
arereplicatedatendhosts. Logically,theend-hostsformanoverlaynetwork,andthegoalofapplication-layermulticast
istoconstruct and maintain anef(cid:2)cient overlay for data transmission. Theeventual data delivery path in application-
layermulticastisanoverlaytree. Whilenetwork-layermulticastmakesthemostef(cid:2)cientuseofnetworkresources,its
limited deployment in the Internet makes application-layer multicast a more viable choice for group communication
overthewide-areaInternet.
Akeychallengeinconstructing aresilientapplication-layer multicastprotocolistoprovidefastdatarecoverywhen
overlay node failures partition the data delivery paths. Overlay nodes are processes on regular end-hosts which are
potentially more susceptible to failures than the routers. Each such failure of a non-leaf overlay node causes data
outagefornodesdownstreamuntilthetimethedatadeliverytreeisreconstructed. Lossesduetooverlaynodefailures
2
1 2 1 3
A B A B
3 4 2 4
Network Layer Multicast Application Layer Multicast
Fig.1. Network-layer and application layermulticast. Squarenodes arerouters, and circularnodes areend-hosts. Thesolidlinesare the
physicallinksanddottedlinesrepresentpeersontheoverlay.Thearrowsindicatethedatadeliverypathforthetwocases.
aremoresigni(cid:2)cantthanregularpacketlossesinthenetworkandmaycausedataoutageintheorderoftensofseconds
(e.g. theNaradaapplication-layer multicastprotocol [6]setsdefaulttimeoutsbetween30-60seconds).
A. OurApproach
PRMusestwosimpletechniques:
AproactivecomponentcalledRandomizedforwarding inwhicheachoverlaynodechoosesaconstantnumberof
(cid:4)
other overlay nodes uniformly at random and forwards data to each of them with a low probability (e.g. 0.01-
0.03). Thisrandomizedforwardingtechniqueoperatesinconjunctionwiththeusualdataforwardingmechanisms
alongthetreeedges,andmayleadtoasmallnumberofduplicate packetdeliveries. Suchduplicates aredetected
and suppressed using sequence numbers. The randomized component incurs very low additional overheads and
canguarantee highdeliveryratiosevenunderhighratesofoverlaynodefailures.
AreactivemechanismcalledTriggered NAKstohandledatalossesduetolinkerrorsandnetwork congestion.
(cid:4)
Through analysis and detailed simulations we show that these relatively simple techniques provide high resilience
guarantees, both in theory and practice. PRM can be used to signi(cid:2)cantly augment the data delivery ratios of any
application-layer multicastprotocol(e.g. Narada[6],Yoid[8],NICE[2],HMTP[24],Scribe[5],CAN-multicast[19],
DelaunayTriangulation-based [11])whilemaintaining lowlatencybounds.
B. Contributions
Thecontributions ofthispaperarethree-fold:
We propose a simple, low-overhead scheme for resilient multicast. To the best of our knowledge, this work is
(cid:4)
the (cid:2)rst proposed resilient multicast scheme that can be used to augment the performance of application-layer
multicastprotocols.
Wepresentrigorousanalysistoshowthatasimplerandomizedapproachissuf(cid:2)cienttoachievehighdatadelivery
(cid:4)
rateswithinlowlatencybounds.
We demonstrate how our proposed scheme can be used with an existing application-layer multicast protocol
(cid:4)
(NICE[2])toprovide alowoverhead, lowlatency and highdelivery ratiomulticast technique forrealistic appli-
cationsandscenarios.
3
M
A
0 1
B B
A A
X
C E F C E F
D T D T N Y
Q Q Z P
G H J K L M N P G H J K L M N P
Overlay subtree with large
number of node failures
Fig.2. ThebasicideaofthePRMscheme. Thecirclesrepresenttheoverlaynodes. Thecrosses
Fig. 3. Successful delivery
indicate link and node failures. The arrows indicate the direction of data (cid:3)ow. The curved edges
withhighprobabilityevenun-
indicatethechosencrossoverlaylinksforrandomizedforwardingofdata.
derhighfailurerateofoverlay
nodes.
C. Roadmap
The rest of this paper is structured as follows. In the next section we present the details of the PRM scheme and
analyze itsperformance inSectionIII. InSectionIVwepresent detailed simulation studiesofthe thePRM-enhanced
NICEprotocol. InSectionVwedescriberelatedworkandconclude inSectionVI.
II. PROBABILISTIC RESILIENT MULTICAST (PRM)
ThePRMschemeemploystwomechanismstoprovideresilience. Wedescribeeachoftheminturn.
A. Randomizedforwarding
Inrandomized forwarding, each overlaynode,withasmallprobability, proactively sendsafewextratransmissions
along randomly chosen overlayedges. Suchaconstruction interconnects thedatadelivery treewithsomecross edges
and is responsible for fast data recovery in PRM under high failure rates of overlay nodes. Existing approaches for
resilient and reliable multicast use either reactive retransmissions (e.g. RMTP [18], STORM [20] Lorax [13]) or
proactive error correction codes (e.g. Digital Fountain [3]) and can only recover from packet losses on the overlay
links. Thereforetheproactive randomizedforwarding isakeydifferencebetweenourapproach andotherwell-known
existingapproaches.
We explain the speci(cid:2)c details of proactive randomized forwarding using the example shown in Figure 2. In the
originaldatadeliverytree(Panel0),eachoverlaynodeforwardsdatatoitschildrenalongitstreeedges. However,due
tonetwork losses onoverlay links (e.g. and ) orfailure ofoverlay nodes (e.g. , and )asubset of
(cid:5)(cid:7)(cid:6)(cid:9)(cid:8)(cid:11)(cid:10)(cid:13)(cid:12) (cid:5)(cid:7)(cid:14)(cid:15)(cid:8)(cid:11)(cid:16)(cid:17)(cid:12) (cid:18) (cid:19) (cid:20)
existingoverlaynodesdonotreceivethepacket(e.g. and ). Weremedythisasfollows. Whenany
(cid:10)(cid:21)(cid:8)(cid:11)(cid:16)(cid:22)(cid:8)(cid:24)(cid:23)(cid:25)(cid:8)(cid:11)(cid:26)(cid:21)(cid:8)(cid:28)(cid:27)(cid:29)(cid:8)(cid:11)(cid:30) (cid:31)
overlay node receives the (cid:2)rst copy of a data packet, it forwards the data along all other tree edges (Panel 1). It also
chooses asmallnumber ( )ofother overlay nodes andforwards datatoeach ofthemwithasmall probability, . For
!
example node chooses toforward datato twoother nodes using cross edges and . Note that as aconsequence
" (cid:16) (cid:31)
oftheseadditional edgessomenodesmayreceivemultiple copiesofthesamepacket (e.g. node inPanel1receives
#
thedataalong thetreeedge andcross edge ). Therefore each overlaynode needsto detect and suppress
(cid:5)(cid:7)(cid:14)(cid:15)(cid:8)$#%(cid:12) (cid:5)(cid:7)&’(cid:8)$#%(cid:12)
suchduplicatepackets. Eachoverlaynodemaintainsasmallduplicatesuppressioncache,whichtemporarilystoresthe
4
setofdatapacketsreceivedoverasmalltimewindow. Datapacketsthatmissthelatencydeadlinearedropped. Hence
thesizeofthecacheislimitedbythelatencydeadlinedesiredbytheapplication. Inpractice,theduplicatesuppression
cachecanbeimplementedusingtheplaybackbufferalreadymaintainedbystreamingmediaapplications. Itiseasyto
see that each node on average sends or receives upto copies of the same packet. The overhead of this scheme
(*)+!,
is ,wherewechoose tobeasmallvalue (e.g. 0.01)and tobebetween and . In ouranalysis weshow thatif
!- ! ( .
the destinations of these cross edges are chosen uniformly at random, it is possible to guarantee successful reception
ofpackets ateachoverlaynodewithahighprobability.
Each overlay node periodically discovers a set of random other nodes on the overlay and evaluates the number of
lossesthatitshareswiththeserandomnodes. InanoverlayconstructionprotocollikeNarada[6],eachnodemaintains
state information about all other nodes. Therefore, no additional discovery of nodes is necessary in this case. For
some other protocols like Yoid [8] and NICE[2] overlay nodes maintain information of only a small subset of other
nodes in the topology. Therefore we implement a node discovery mechanism, using a random-walk on the overlay
tree. A similar technique has been used in Yoid [8] to discover random overlay group members. The discovering
node transmits aDiscover message witha time-to-live (TTL)(cid:2)eld toits parent on the tree. The message israndomly
forwarded from neighbor to neighbor, without re-tracing its path along the tree and the TTL (cid:2)eld is decremented at
each hop. The node at which the TTL reaches zero is chosen as the random node. Why is Randomized Forwarding
effective? It is interesting to observe why such a simple, low-overhead randomized forwarding technique is able to
increasepacketdeliveryratioswithahighprobability,especiallywhenmanyoverlaynodesfail. Considertheexample
shown in Figure 3, where a large fraction of the nodes have failed in the shaded region. In particular, the root of the
sub-tree, node , has also failed. So if no forwarding is performed along cross edges, the entire shaded sub-tree is
(cid:6)
partitioned fromthedatadelivery tree. Nooverlay nodeinthisentire sub-tree wouldgetdatapackets tillthepartition
is repaired. However using randomized forwarding along cross edges a number of nodes from the unshaded region
willhaverandomedgesintotheshadedregionasshown( and ). Theoverlaynodesthatreceive
(cid:5)/(cid:31)0(cid:8)213(cid:12)(cid:24)(cid:8)4(cid:5)(cid:7)56(cid:8)(cid:11)78(cid:12) (cid:5)(cid:7)&’(cid:8)(cid:24)9%(cid:12)
dataalongsuch randomly chosencross edgeswillsubsequently forwarddataalongregular treeedgesandanychosen
randomedges. Sincethecrossedgesarechosen uniformlyatrandom,alargesubtreewillhaveahigherprobability of
crossedgesbeingincidentonit. Thusasthesizeofapartitionincreases,sodoesitschanceofrepairusingcrossedges.
B. TriggeredNAKs
This is the reactive component of PRM. We assume that the application source identi(cid:2)es each data unit using
monotonically increasing sequence numbers. An overlay node can detect missing data using gaps in the sequence
numbers. This information is used to trigger NAK-based retransmissions. This technique has been applied for loss
repairinRMTP[18].
In our implementation each overlay node, , piggybacks a bit-mask with each forwarded data packet indicating
:
which of the prior sequence numbers it has correctly received. The recipient of the data packet, , detects missing
;
packetsusingthegapsinthereceivedsequence andsendsNAKsto torequesttheappropriate retransmissions. Note
:
that caneitherbeaparentof inthedatadeliverytree,orarandomnodeforwardingalongacrossedge. Weillustrate
: ;
theuseoftriggered NAKsinFigures4and5.
5
18 17 16 15 14
Z W : 1 0 1 1 1
Z
Seq. no: 18 Seq. no: 31
NAK: 16
17 16 15 14 30 29 28 27
v Z : 0 1 1 1 vP : 0 0 1 1
18 17 16 15 14
Y W : 1 0 0 1 1
Y
Seq. no: 18
NAK: 28, 27
17 16 15 14 NAK: 15, 14 Y P
vY : 0 0 1 1 31 30 29 28 27 EGF_Request 31 30 29 28 27
18 17 16 15 14 WY : 1 0 0 0 0 WP : 1 0 0 1 1
X WX : 1 0 0 0 0
Fig. 5. Triggered NAKs to source of random forwarding for data
Fig.4. TriggeredNAKstoparentontheoverlay
withsequencenumber31. Thevalueof forEphemeralGuaranteed
<
tree when data unit with sequence number 18 is
Forwardingissetto3.
propagated along the overlay. The length of the
bit-maskis4.
C. Extensions: Losscorrelation andEphemeralGuaranteed Forwarding
Wedescribe twoextensions tothePRMschemethatfurtherimprovetheresilience ofthedatadelivery.
Loss Correlation: This is a technique that can be used to improve the randomized forwarding component of
PRM. As described in Section II-A each overlay node chooses a small number of cross edges completely at random
for probabilistic data forwarding on the overlay. In practice, it is possible to increase the utility of these cross edges
by choosing them more carefully. In particular if is the root (and source) of the overlay tree, we want to choose a
=
cross edge between two overlay nodes and , if and only if the correlation between the packets lost on the overlay
: ;
paths and is low. Clearly if these two overlay paths share no underlying physical links, then
(cid:5)/=6>?A@ :B(cid:12) (cid:5)/=6>?C@ ;D(cid:12)
we expect the losses experienced by and to be uncorrelated. However, such a condition is dif(cid:2)cult to guarantee
: ;
foranyoverlay protocol. Therefore under the losscorrelation extension, weleteach node tochoose arandom edge
:
destination, ,withwhichhastheminimumnumberofcommonlossesoveralimitedtimewindow.
;
Ephemeral Guaranteed Forwarding: This is an extension to the triggered NAKcomponent and is also useful in
increasingthedatadeliveryratio. Considerthecasewhennode receivesadatapacketwithsequencenumber along
: E
a random edge from node . If on receiving the data packet with sequence number , detects a large gap (greater
; E :
thanathreshold )inthesequencenumberspaceitislikelythattheparentof hasfailed. Inthiscase, canrequest
F : :
to increase the random forwarding probability, , for the edge to one for ashort duration of time. Todo this
; ! (cid:5)G;H(cid:8)2:I(cid:12)
sends a EGF Request message to . Note that the EGF state is soft and expires within a time period . This
: ; #KJ,LIM
is shown with an example in Figure 5. Node receives data with sequence number 31 along a random edge from
7
. immediately requests retransmissions for data with sequence numbers 28 and 27. Since , also sends
& 7 FONP. 7
the EGF Request message to . If the tree path of is repaired before the EGF period expires, can also send a
& : :
EGF Cancelmessageto toterminatethisstate.
;
The EGF mechanism is useful for providing uninterrupted data service when the overlay construction protocol is
detecting and repairing a partition in the data delivery tree. In fact, putting such a mechanism in place allows the
overlay nodes to use larger timeouts to detect failure of overlay peers. This, in turn, reduces the control overheads
of the application-layer multicast protocol. In practice, the EGFmechanism can sometimes be overly aggressive and
6
cause false positives leading to a higher amount of data duplication on the data delivery tree. Thus, the improvement
inperformanceisatthecostofadditionalstate,complexityandpacketoverheadatnodes. Dependingonthereliability
requirements, applications maychoose toenableordisable EGF.
III. EVALUATION OF PRM
Akey component ofthe PRMscheme is therandomized forwarding technique which achieves high delivery ratios
inspite of a large number of overlay node failures. In this section we (cid:2)rst analytically prove that even a (cid:147)simpli(cid:2)ed(cid:148)
randomized forwarding scheme can provide high probability delivery guarantees when only a constant number of
randomedgesareusedbyeachoverlaynode. Wealsoshowthatthissimpli(cid:2)edschemeachievesgoodlatencybounds.
SubsequentlywepresentsimulationresultsofthefullversionofthePRMscheme. Thesimulationresultspresentedin
thissection describetheperformance ofthePRMschemewithoutmodelinganunderlying application-layer multicast
protocol. Therefore these simulations only model random failures of overlay nodes and random packet losses on
overlaylinksforanoverlaytree. Wehavealsoimplemented PRMoveraspeci(cid:2)capplication-layer multicastprotocol,
NICE [2]. We present detailed performance studies which includes the consequent interactions between PRM and
NICEinSectionIV.
A. AnalysisofSimpli(cid:2)edPRM
For the tractability of the analysis we consider a simpli(cid:2)ed version of the PRM scheme (See Figure 14, located in
the Appendix along with the detailed proofs). A parent node forwards data to all of its children along the overlay
tree edges. Each node chooses such distinct random edges, and data is forwarded independently along each such
random edge withprobability . However,data packets received along random edges can besubsequently forwarded
!
only along other random edges. For example, in Figure 14, node receives data along the random edge . It
(cid:30) (cid:5)(cid:7)5(cid:21)(cid:8)(cid:11)(cid:30)Q(cid:12)
isallowedtoforward this data packet along another random edge (e.g. , butnot toits parent (orchildren, if
(cid:5)(cid:7)(cid:30)6(cid:8)(cid:24)(cid:23)R(cid:12) (cid:10)
it had any). Weanalyze twodifferent versions of this simpli(cid:2)ed scheme: (i) the random edges are restricted between
nodesonthesamelevelofthetree,and(ii)therandomedgescangotoarbitraryothernodes. Thesesimpli(cid:2)edversions
imposestricter data-forwarding rulesonthedatadelivery treethan thecomplete version described inSection II; thus,
the analytic bounds for the probability of successful data-delivery that we obtain for these schemes serve as lower
bounds for our actual proposed scheme. The additional data overhead of the PRM scheme is given by , which is
S!
a constant independent of the size of the overlay. We now show that even with such a constant data overhead, the
simpli(cid:2)ed PRMscheme scales well; that is, even as the number of nodes in the overlay ( )increases, the guaranteed
T
boundsonreliability andlatencyholdwitharbitrarily highprobability.
Background and Main Results: The overlay data delivery path is a rooted tree with the source node as the root,
U
at level . does not fail. Every other overlay node does not fail with some probability and each overlay link has a
V U
packetlossprobability; allthesefailureeventsareindependent. Thetotalnumberofnodesis .
T
Wewillmakethefollowingassumptions abouttheoverlaytreeandthefailure probabilities.
(A1) Thetreehassomedepth ; for ,allnodes atlevel have some children. Wewilllater
W XYNZV[(cid:8)\(](cid:8)_^_^_^S(cid:8)(cid:11)W‘?a( X (cid:10)cb
take to be a suf(cid:2)ciently large constant (which depends on the success probability we need). The only
(cid:10)ed
7
otherrequirement isthat for .
(cid:10)ebgfah Xifj(
(A2) Callanode(cid:147)good(cid:148)ifitdidnotfail,andifitreceivedthedatafromitsparent(i.e.,withoutalinkloss). Then,
foranynode ,let betheexpectednumberofgoodchildrenof ,conditional on beinggood. Thereare
k lBm k k
constants and suchthatforanynode atanylevel , .
npoqV r6oZ( k XYsqW(cid:25)?q( l,m8f+(cid:10)‘b/npftr
(A3) Let bethe number ofoverlay nodes inlevel that areexpected tosurvive. Thereis someconstant
(cid:6)ub X vwoxV
such that for all , ; in other words, some constant fraction (which is here) of the
X (cid:6)yb*fzv(cid:13){](cid:10)|d}(cid:10)(cid:15)~-{_{_{(cid:24)(cid:10)‘b(cid:128)(cid:127)B~ v
overlaynodesatanylevel areexpected tosurvive.
X
In the Appendix we discuss how some of these assumptions can be relaxed, and how the remaining are necessary.
Our two main results are as follows. Both of them show that all surviving overlay nodes get the data with high
probability, even if the overhead is only some constant that is independent of ; thus, the protocols scale with
S! T
system-size. Furthermore, the (cid:2)rst result shows that when the random forwarding is done within the same level, we
achievedatadeliverywithhighprobabilitywithinalowlatencybound. Everysurvivingoverlaynodeatanylevel gets
X
thedata using overlay hops withhigh probability. Thus, everysurviving overlay node receives the data within a
(cid:129)(cid:131)(cid:130)GX(cid:133)(cid:132)
constant factorboundofitsend-to-end latencyoverlayfromtheroot.
TheoremIII.1: Considerthecasewheretherandomforwardingisonlydonewithinthesamelevel. Supposeweare
given an arbitrary constant (con(cid:2)dence parameter) . Then, there is a constant threshold that is
(cid:134)(cid:136)(cid:135)x(cid:130)(cid:7)V[(cid:8)\(4(cid:132) (cid:137)\d (cid:129)e(cid:130)$(4(cid:138)](cid:134)](cid:132)
and another constant such that if and , then the following holds with probability at least
(cid:10) d (cid:130)/(cid:134)S(cid:132) ]!zf0(cid:137) d (cid:10)(cid:139)f(cid:140)(cid:10) d (cid:130)/(cid:134)](cid:132)
: Foreverylevel ,atleastan (cid:150)fraction oftheoverlay nodes inthat levelwhichsurvive, receive thedata
(u?w(cid:134) X (cid:130)$((cid:141)?p(cid:134)S(cid:132)
fromtheroot;also,theydosowithin steps.
(cid:129)e(cid:130)GX(cid:133)(cid:132)
Thenextresultdealswiththecasewhererandomforwardingisdonetoarbitraryoverlaynodesinthetree;here,the
worst-casewaitingtimeis ,withhighprobability.
(cid:129)(cid:131)(cid:130)(cid:7)(cid:142)(cid:144)(cid:143)(cid:146)(cid:145)YT(cid:147)(cid:132)
TheoremIII.2: Considerthecasewheretherandomforwardingisdonetoarbitraryotheroverlaynodesinthetree.
Suppose we are given an arbitrary constant (con(cid:2)dence parameter) . Then, there is a constant threshold
(cid:134)(cid:148)(cid:135)(cid:149)(cid:130)(cid:7)V[(cid:8)\(4(cid:132)
that is and another constant such that if and , then the following holds with
(cid:137)$d (cid:129)e(cid:130)$(4(cid:138)](cid:134)](cid:132) (cid:10)8d(cid:146)(cid:130)/(cid:134)S(cid:132) ]!(cid:149)f(cid:150)(cid:137)2d (cid:10)(cid:151)f(cid:152)(cid:10)|d(cid:153)(cid:130)/(cid:134)](cid:132)
probability at least : At least an (cid:150)fraction of the overlay nodes which survive, receive the data from the
(%?(cid:154)(cid:134) (cid:130)$(y?(cid:154)(cid:134)](cid:132)
root;also,theydosowithin steps.
(cid:129)e(cid:130)(cid:7)(cid:142)(cid:144)(cid:143)(cid:146)(cid:145)KT(cid:147)(cid:132)
Note that the two theorems involve (slightly) different schemes and latency bounds; their proofs are sketched in
the Appendix, and require a careful probabilistic analysis. One of the main delicate issues related to the randomized
forwarding crops up when nearly all the required overlay nodes have received the data: from this point on, much of
the random forwarding is likely to go to the nodes that have already received the data. In particular, as sketched in
theAppendix, if doesnot growlinearly with , withhigh probability wewillnot reach the number ofsurviving
]! (4(cid:138)](cid:134)
overlaynodesthatweaimto. Thus,thecommunicationoverheadsofourschemeareoptimal. Inessence,weshowthat
therandomizedforwardingpercolatesthroughthenetworkintwodistinctepochs: the(cid:2)rst,whenthenumberofoverlay
nodes reached yetis(cid:147)nottoolarge(cid:148) (cid:150)therateofgrowthofthis numberisfastenough here; andthesecond, whenthe
rate of growth is(much) smaller as indicated afew sentences above. Weare able to show that with asuitably-chosen
valueof andacarefulanalysis, thesecondepochsucceedswithhighprobability.
S!
Thelineargrowthoftheoverheads, with holdsonlyforthesimpli(cid:2)ed,restrictedversionofthePRMscheme.
]! (4(cid:138)](cid:134)
8
22K overlay nodes, Link loss = 0.05 22K overlay nodes, Node failure = 0.02, Link loss = 0.05
1 1
PRM-3,0.02
PRM-3,x
HHR 0.95
0.9 FEC-0.06
BE 0.9
0.8 HHR
0.85
Delivery Ratio 00..67 Delivery ratio 0.007..578
0.5
0.65 FEC-y
0.4
0.6
BE
0.3 0.55
0 0.02 0.04 0.06 0.08 0.1 0 0.01 0.02 0.03 0.04 0.05 0.06
Instantaneous node failure fraction Additional data overhead factor
Fig.6. Deliveryratioofdifferentschemesasthefractionofin- Fig.7. Deliveryratioofdifferentschemesasthedataoverheadis
stantaneouslyfailedoverlaynodesisvaried. varied.
Notethatintherestrictedversion,datareceivedalongcrossedgesarenotforwardedonanytreeedge. Thefullversion
ofPRMdoesnothave sucharestriction and therefore incurs signi(cid:2)cantly lesser overheads. Weexamine the different
aspectsofthefullversionofPRMnext.
B. Simulations ofPRMinanIdealized Environment
The above theorems show that even the simpli(cid:2)ed version of PRM can maintain good reliability properties with
highprobability. Nowweshowthatthefullversion ofPRMisalsoperforms quitewellinpractice. Inthese idealized
simulations, we assume that there exists some application-layer multicast protocol which constructs and maintains a
datadeliverytree. Whendatapacketsaresentonthistree,asubsetoftheoverlaynodes,chosen uniformlyatrandom,
failsimultaneously. Thefailedsubsetischosenindependentlyforeachdatapacketsent. Additionallydatapacketsalso
experiencenetworklayerlossesonoverlaylinks. WepresentamorerealisticsimulationstudyinSectionIVwherewe
explorethevagariesofimplementing thePRMschemeoveraspeci(cid:2)capplication-layer multicastprotocol.
Inthisstudy,weperformedsimulation studiescomparingthreedifferentschemes:
BE:Thisisthebest-effort multicast schemeandhasnoreliability mechanisms andtherefore serves asabaseline
(cid:4)
forcomparison.
HHR:This is areliable multicast scheme, where each overlay link employs a hop-by-hop reliability mechanism
(cid:4)
using negative acknowledgments (NAKs). A downstream overlay node sends NAKs to its immediate upstream
overlaynodeindicatingpacketlosses. OnreceivingtheseNAKstheupstreamoverlaynoderetransmitsthepacket.
Thisscheme,therefore, hidesallpacketlossesontheoverlaylinks.
FEC- : This is an idealized equivalent of an FEC-based reliability scheme with a stretch factor of . The
(cid:4)
; (u)t;
overhead of this scheme, therefore, is . In this idealized scenario we assume that if fraction of packets are
; :
lost onthe end-to-end overlay path, areceiver recovers fraction ofthe packets. In practice the
(cid:155)e(cid:156)(cid:158)(cid:157)(cid:159)(cid:130)2(](cid:8)2:(cid:160)(cid:130)$(K)p;D(cid:132)2(cid:132)
delivery ratio of an FEC-based scheme depends on the size of the encoding. However, for low overhead ranges
the above approximation provides a reasonable upper-bound. We perform more detailed comparisons of PRM
withFEC-basedschemesinSectionIV.
9
PRM- : This isthe PRMscheme. indicates thenumber ofrandom nodes chosen by each node, and is the
(cid:4)
¡(cid:8)2! !
forwarding probability for each of these random edges. This scheme as de(cid:2)ned in Section II employs the hop-
by-hop reliability ofoverlay linksusing the triggered NAKtechnique. PRMincurs additional dataoverheads
]!
abovethebest-effort scheme.
We compare the worst-case data delivery ratio achieved by the different schemes for varying instantaneous node
failure rates and the additional overheads incurred. The data delivery ratio is de(cid:2)ned as the ratio of the number of
data packets successfully received by the node to the number of data packets sent by the source. Since these are not
packet-level simulations, wedefertheresultsofdeliverylatencytothenextsection.
a) Resilience against node failures: In Figure 6 we plot the worst case delivery ratio of the different schemes
asthenumber ofsimultaneous node failures ontheoverlay tree ofsize22,000 isvaried. Theloss probability oneach
overlaylinkwas5%forthisexperiment. WecanobservethatthedeliveryratioofthePRMschemedegradesgracefully
withincreaseinthenumberofsimultaneousnodefailuresintheoverlaytree. Foraveryhighfailureratescenariowhen
5%of theoverlay nodes simultaneously fail(i.e. 1100 nodes) for each packet sent, PRM-3,0.02incurs 6%additional
dataoverheadsandachievesabout90%successful datadeliverytonodes,whiletheHHRschemeachievesabout70%
successful delivery. The best-effort scheme with no reliability mechanism performs poorly. Note that the FEC-based
schemedoesnotprovideanysigni(cid:2)cant bene(cid:2)tswhenthestretchfactorofthecodeislow.
b) AdditionalDataOverheads: InFigure7weshowhowthedatadeliveryratioofthePRMschemedependson
theadditionaldataoverheadsincurred. Inthisexperimentwesimulatedthesituationwhenthelinklossprobabilitywas
0.05 and 2%of the overlay nodes (i.e. 440 overlay nodes) fail simultaneously for each data packet sent. While usual
multicast groups typically willnotsee such alarge number ofsimultaneous failures, agoal of this experimentation is
tounderstandtheresiliencethatcanbeachievedbythePRMschemedespiteitslowoverheads. Forthisexperimentwe
varied the parameter of PRMto control the additional overheads of the scheme. Even will less than 1% overheads
!
thePRMschemeisabletoachieveadeliveryratiogreaterthan95%. FEC- achieves verylittleimprovementover
V[^¢V(cid:153)£
thebest-effortschemesinceitsperformanceismarginalwhenonlylowdataoverheadsarepermissible. Byintroducing
signi(cid:2)cantly higheroverheads thedatadeliveryratiooftheFECschemecanbeimproved.
IV. SIMULATION EXPERIMENTS
The PRMscheme can be applied to any overlay-based multicast data delivery protocol. In our detailed simulation
study weimplemented PRMover the NICEapplication-layer multicast protocol [2]. Weused NICEbecause ofthree
reasons: (1)theauthorsin [2]showthattheNICEapplication-layermulticastprotocolachievesgooddeliveryratiosfor
abest-effortscheme;(2)NICEisascalableprotocolandthereforewecouldperformdetailedpacket-level simulations
forlargeoverlaytopologies; (3)thesource-code forNICEispublicly available.
We have studied the performance of the PRM-enhanced NICEprotocol using detailed simulations. In our simula-
tionswehavecompared theperformance ofthefollowingschemes:
BE:Thisistheoriginal best-effort multicast version ofNICEasreported in[2]andwithnoadditional reliability
(cid:4)
mechanisms.
10
HHR:InthisschemethebasicNICEprotocol isenhanced toprovide hop-by-hop reliability oneachoverlaylink
(cid:4)
usingNAKs.
FEC-( ): This is another enhanced version of the NICE protocol in which the source uses a forward error
(cid:4)
WA(cid:8)2
correction mechanism torecover frompacket losses. Inthis scheme, thesource takes asetof data packets and
W
encodes them into aset of packets and sends this encoded data stream to the multicast group. Areceiving
W(cid:159))+
member can recover the data packets if it receives any of the encoded packets 1. Additional data
W W ‘)⁄W
overheadsofthisschemeare .
¥(cid:138)SW
PRM-( ): ThisisourproposedPRMenhancementsimplementedonthebasicNICEapplication-layermulticast
(cid:4)
S(cid:8)2!
protocol,where isthenumberofrandomedgeschosenbyeachoverlaynodeand istheforwardingprobability
!
on each of these edges. The additional data overheads of this scheme is . For all these experiments we
]!
implementedthelosscorrelationextensionstoPRM.WeenableEGFforonlyonespeci(cid:2)cexperiment(described
later)duetoitshigheroverheads. OurresultswillshowthatEGFisusefulforverydynamicscenarios,atthecost
ofhigherdataoverheads.
Choosing FEC parameters: Since the FEC-based schemes need to send packets instead of packets we
W(cid:17))t W
use a higher data rate at the source (i.e. a data rate of times the data rate used by the other schemes). The
(cid:130)(cid:7)W(cid:9))a ¥(cid:132)2(cid:138)SW
resilienceofanFEC-basedschemecanbeincreasedbyincreasingtheoverheadsparameter, . Theperformanceofthe
FEC-basedschemescan,infact,beimprovedsigni(cid:2)cantly byallowing thesource tocyclically sendtheencoded data
packetsmultipletimes. However,thiswouldincreasethedataoverheads. Forexample,ifthesourcecyclesthroughthe
encodeddata times,theadditionaloverheadswouldbe . Alsoforthesameamountofadditionaldata
T (cid:130)GTB ƒ)(cid:131)(cid:130)GTi?(cid:17)(4(cid:132)(cid:133)W[(cid:132)2(cid:138)SW
overheads, resilience against network losses of the FEC-based schemes improve if we choose higher values of and
W
. Forexample, FEC-(128,128) has better data recovery performance than FEC-(16,16) even though both have 100%
overhead. This improved reliability comes at the cost of increased delivery latencies. Therefore, the maximum value
of and depends on the latency deadline. We have experimented with a range of such choices upto the maximum
W
possible value that will allow correct reception at receivers within the latency deadline. However, we observed that
in presence of failures of overlay nodes increasing and does not always improve the resilience properties. This
W
is because choosing higher values of and leads to increased latencies in data recovery. However when the group
W
change rateishigh the datadelivery paths break before theFEC-based recovery can complete, therefore theachieved
datadeliveryratioislow.
A. SimulationScenarios
In all these experiments we model the scenario of a source node multicasting streaming media to a group. The
sourcesendsCBRtraf(cid:2)cattherateof16packetspersecond. Foralltheseexperimentswechosealatencydeadlineof
upto8seconds. Asaconsequence thesizeofthethepacketbufferforNAK-basedretransmissions is128. Thepacket
bufferwillbelargerforlongerdeadlines.
§ TheDigitalFountaintechnique[3],usesTornadocodesthatrequirethereceivertocorrectlyreceive packetstorecoverthe data
¤(cid:7)'I“8«›‹Gfi fi
packets,where isasmallconstant.
«
Description:Seungjoon Lee, Bobby Bhattacharjee, Aravind Srinivasan. Department of Computer Science, University of Maryland, College Park, MD 20742, USA.