Table Of Content(cid:0)
Parallel Program Archetypes
Berna L(cid:2) Massingill and K(cid:2) Mani Chandy
California Institute of Technology (cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)
Pasadena(cid:9) California (cid:10)(cid:11)(cid:11)(cid:3)(cid:4)
fberna(cid:9)manig(cid:0)cs(cid:2)caltech(cid:2)edu
January (cid:12)(cid:8)(cid:9) (cid:11)(cid:10)(cid:10)(cid:13)
Abstract
A parallel program archetype is anabstractionthatcapturesthe com(cid:2)
mon features of a class of problems with similar computationalstructure
and combinesthem with a parallelizationstrategyto producea pattern of
data(cid:3)ow and communication(cid:4) Such abstractionsare useful in application
development(cid:5)both asa conceptualframework andas a basisfor toolsand
techniques(cid:4) This paper describes an approach to parallel application de(cid:2)
velopmentbased on archetypesand presentstwo example archetypeswith
applications(cid:4)
(cid:0) Introduction
Thispaperproposes aspeci(cid:2)c methodofexploitingcomputationalanddata(cid:3)ow
patterns to help in developing reliable parallel programs(cid:4) A great deal of work
has been done on methods of exploiting design patterns in program develop(cid:5)
ment(cid:4) This paper restricts attention to one kind of pattern that is relevant in
parallelprogramming(cid:6) the pattern ofthe parallelcomputationandcommunica(cid:5)
tion structure(cid:4)
Methods of exploiting design patterns in program development begin by
identifying classes of problems with similar computational structures and cre(cid:5)
ating abstractions that capture the commonality(cid:4) Combininga problem class(cid:7)s
computationalstructure with a parallelization strategy gives rise to a data(cid:3)ow
pattern andhence acommunicationstructure(cid:4) Itis thiscombinationofcompu(cid:5)
tationalstructure(cid:8) parallelizationstrategy(cid:8) and the impliedpattern of data(cid:3)ow
andcommunicationthatwecaptureasaparallelprogrammingarchetype(cid:8)orjust
an archetype(cid:4)
(cid:0)
This work was supportedin part by the AFOSR undergrantnumberAFOSR(cid:2)(cid:3)(cid:4)(cid:2)(cid:5)(cid:5)(cid:6)(cid:5)(cid:7)
andin partby theNSF underCooperativeAgreementNo(cid:8) CCR(cid:2)(cid:3)(cid:4)(cid:9)(cid:5)(cid:5)(cid:5)(cid:10)(cid:8) The government
hascertainrightsinthismaterial(cid:8)
(cid:9)
Much previous work also addresses the identi(cid:2)cation and exploitation of
patterns(cid:6) The idea of design patterns(cid:8) especially for object(cid:5)oriented design(cid:8) has
received agreat deal ofattention (cid:10)e(cid:4)g(cid:4)(cid:8)(cid:11)(cid:12)(cid:13)(cid:8) (cid:14)(cid:14)(cid:15)(cid:16)(cid:4) Librariesof programskeletons
for functional and other programs have been developed (cid:11)(cid:17)(cid:8) (cid:9)(cid:13)(cid:8) (cid:9)(cid:18)(cid:15)(cid:4) Algorithm
templatesofthemorecommonlinearalgebraprogramshavebeendevelopedand
then used in designing programs for parallel machines (cid:11)(cid:13)(cid:15)(cid:4) Parallel structures
have been investigated by many other researchers (cid:11)(cid:18)(cid:8) (cid:12)(cid:12)(cid:15)(cid:4) Structuring parallel
programs by means of examining data(cid:3)ow patterns has also been investigated
(cid:11)(cid:12)(cid:9)(cid:15)(cid:4) Our contribution is to show that combining consideration of broadly(cid:5)
de(cid:2)ned computational patterns with data(cid:3)ow considerations is useful in the
systematicdevelopmentofe(cid:19)cientparallelprogramsinavarietyofwidely(cid:5)used
languages(cid:8)including Fortran and C(cid:4)
(cid:0)(cid:2)(cid:0) Archetype(cid:3)based assistance for application develop(cid:3)
ment
Although the data(cid:3)ow pattern is the most signi(cid:2)cant aspect of an archetype
in terms of its usefulness in easing the task of developing parallel programs(cid:8)
includingcomputationalstructure as part of the archetype abstraction helps in
identifying the data(cid:3)ow pattern and also provides some of the other bene(cid:2)ts
associated with patterns(cid:4) Such archetypes are useful in manyways(cid:6)
(cid:0) A program skeleton and code library can be created for each archetype(cid:8)
where the skeleton deals with process creation and interaction between
processes(cid:8) and the code library encapsulates details of the interprocess
interaction(cid:4) If a sequential program (cid:2)ts an archetype(cid:8) then a parallel
program can be developed by (cid:3)eshing out the skeleton(cid:8) making use of
the code library(cid:4) The (cid:3)eshing(cid:5)out steps deal with de(cid:2)ning the sequential
structure of the processes(cid:4) Thus(cid:8) programmers can focus their attention
primarilyon sequential programmingissues(cid:4)
(cid:0) One way to achieve portability and performance is to implement com(cid:5)
mon patterns of parallel structures (cid:20) those for a particular archetype or
archetypes (cid:20)ondi(cid:21)erenttargetarchitectures (cid:10)e(cid:4)g(cid:4)(cid:8)multicomputers(cid:8)sym(cid:5)
metricmultiprocessors(cid:8)andnon(cid:5)uniform(cid:5)memory(cid:5)accessmultiprocessors(cid:16)(cid:8)
tuning the implementationto obtain good performance(cid:4) The cost of this
performanceoptimizatione(cid:21)ortisamortizedoverallprogramsthat(cid:2)tthe
pattern(cid:4) Indeed(cid:8) since an archetype captures computational structure as
wellasdata(cid:3)owpattern(cid:8) itispossibleforanimplementationofadata(cid:3)ow
pattern to support more than one archetype(cid:4)
(cid:0) Programmers often transform sequential programs to execute e(cid:19)ciently
on parallelmachines(cid:4) The process oftransformationcan be laboriousand
error(cid:5)prone(cid:4) However(cid:8) this transformation process can be systematized
for sequential programs that (cid:2)t speci(cid:2)c computational patterns(cid:22) then(cid:8) if
(cid:12)
a sequential program(cid:2)ts one of these patterns (cid:10)archetypes(cid:16)(cid:8) the transfor(cid:5)
mationsteps appropriatetothat pattern canbe used(cid:4) Exploitationofthe
pattern can make the transformation more systematic(cid:8) more mechanical(cid:8)
and better suited to automation(cid:4)
(cid:0) Just as the identi(cid:2)cationofcomputationalpatterns inobject(cid:5)oriented de(cid:5)
sign is useful in teaching systematic sequential program design(cid:8) identi(cid:2)(cid:5)
cation of computational and data(cid:3)ow patterns (cid:10)archetypes(cid:16) is helpful in
teaching parallel programming(cid:4)
(cid:0) Similarly(cid:8) just as the use of computational patterns can make reasoning
about sequential programs easier by providing a framework for proofs of
algorithmiccorrectness(cid:8) archetypes canprovideaframeworkforreasoning
about the correctness of parallel programs(cid:4) Archetypes can also provide
frameworks for testing and documentation(cid:4)
(cid:0) In somecases(cid:8) parallelizingcompilerscan generate programsthat execute
more e(cid:19)ciently on parallel machines if programmersprovide information
about their programs in addition to the program text itself(cid:4) Although
the focus of this paper is on active stepwise re(cid:2)nement by programmers
and not on compilation tools(cid:8) we postulate that the data(cid:3)ow pattern is
informationthat can be exploited by a compiler(cid:4)
(cid:0) Archetypes may also be helpful in developing performance models for
classes of programswith commonstructure(cid:8) as discussed in (cid:11)(cid:14)(cid:12)(cid:15)(cid:4)
(cid:0) Archetypes can be useful in structuring programs that combine task and
data parallelism(cid:8)as described in (cid:11)(cid:9)(cid:12)(cid:15)(cid:4)
(cid:0)(cid:2)(cid:4) An archetype(cid:3)based program development strategy
Our general strategy for writing programs using archetypes is as follows(cid:6)
(cid:9)(cid:4) Start with a sequential algorithm(cid:10)or possibly a problem description(cid:16)(cid:4)
(cid:12)(cid:4) Identify an appropriate archetype(cid:4)
(cid:14)(cid:4) Develop an initial archetype(cid:5)based version of the algorithm(cid:4) This ini(cid:5)
tial version is structured according to the archetype(cid:7)s pattern and gives
an indication of the concurrency to be exploited by the archetype(cid:4) Es(cid:5)
sentially(cid:8) this step consists of structuring the original algorithm to (cid:2)t
the archetype pattern and (cid:23)(cid:2)lling in the blanks(cid:24) of the archetype with
application(cid:5)speci(cid:2)c details(cid:4) Transformingthe originalalgorithminto this
archetype(cid:5)based equivalent can be done in one stage or via a sequence
of smaller transformations(cid:22) in either case(cid:8) it is guided by the archetype
pattern(cid:4)
(cid:14)
An important feature of this initial archetype(cid:5)based version of the algo(cid:5)
rithmisthatitcanbeexecutedsequentially(cid:10)byconvertinganyexploitable(cid:5)
concurrency constructs to sequential equivalents(cid:8) as described in the ex(cid:5)
amples(cid:16)(cid:4) For deterministic programs(cid:8) this sequential execution gives the
same results as parallelexecution(cid:22) this allowsdebugging in the sequential
domainusing familiartools and techniques(cid:4)
(cid:13)(cid:4) Transform the initial archetype(cid:5)based version of the algorithm into an
equivalent algorithm suitable for e(cid:19)cient execution on the target archi(cid:5)
tecture(cid:4) The archetype assistsinthistransformation(cid:8)eitherviaguidelines
tobeappliedmanuallyorviaautomatedtools(cid:4) Again(cid:8)the transformation
can optionally be broken down into a sequence of smaller stages(cid:8) and in
some cases intermediate stages can be executed (cid:10)and debugged(cid:16) sequen(cid:5)
tially(cid:4) A key aspect ofthis transformationprocess is that the transforma(cid:5)
tions de(cid:2)ned by the archetype preserve semantics and hence correctness(cid:4)
(cid:25)(cid:4) Implement the e(cid:19)cient archetype(cid:5)based version of the algorithm using a
language or library suitable for the target architecture(cid:4) Here again the
archetype assists in this process(cid:8) not only by providing suitable trans(cid:5)
formations (cid:10)either manualor automatic(cid:16)(cid:8) but also by providing program
skeletons and(cid:26)or libraries that encapsulate someof the details of the par(cid:5)
allel code (cid:10)process creation(cid:8) message(cid:5)passing(cid:8) and so forth(cid:16)(cid:4)
A signi(cid:2)cantaspect ofthis step isthat it isonlyhere that the application
developer mustchoose aparticularlanguageorlibrary(cid:22)thealgorithmver(cid:5)
sions produced in the preceding steps can be expressed in any convenient
notation(cid:8) since the ideas are essentially language(cid:5)independent(cid:4)
Thispaperpresents twoexamplearchetypes andshowshowtheyandthisstrat(cid:5)
egy can be used to develop applications(cid:4) Our work to date has concentrated
on target architectures with distributed memoryand message(cid:5)passing(cid:8) and the
discussion re(cid:3)ects this focus(cid:8) but we believe that the work has applicabilityfor
shared(cid:5)memoryarchitectures as well(cid:4)
(cid:2) Example(cid:3) The one(cid:4)deep divide(cid:4)and(cid:4)conquer
archetype
(cid:4)(cid:2)(cid:0) Computational pattern
(cid:0)(cid:2)(cid:3)(cid:2)(cid:3) Traditional divide and conquer
Divide and conquer is a useful paradigm for solving a number of problems(cid:8)
including such diverse applications as sorting(cid:8) searching(cid:8) geometric algorithms(cid:8)
and matrix multiplication(cid:4) Brie(cid:3)y(cid:8) the divide(cid:5)and(cid:5)conquer approach works as
follows(cid:6) The originalproblem P is split into N subproblems P(cid:0)(cid:2)(cid:3)(cid:3)(cid:3)(cid:2)PN(cid:4) Each
(cid:13)
subproblemPi issolved(cid:10)directlyifitissu(cid:19)cientlysmall(cid:8)otherwiserecursively(cid:16)
toproducesubsolutionSi(cid:4) TheN subsolutionsarecombinedtoproducesolution
S to problemP(cid:4)
Inthisparadigm(cid:8)thesubproblemsintowhichtheoriginalproblemissplitcan
be solved independently(cid:8) and a parallel programcan be produced by exploiting
this potential concurrency(cid:8) as shown in (cid:2)gure (cid:9)(cid:6) Every time the problem is
split into concurrently(cid:5)executable subproblems(cid:8) a new process is created(cid:8) until
somethresholdsizeisreached(cid:8)whereuponthesubproblemissolvedsequentially(cid:8)
possibly using a sequential divide and conquer algorithm(cid:4) Such programs can
split
split split
solve solve solve solve
merge merge
merge
Figure (cid:9)(cid:6) Parallelizationof traditionaldivide and conquer(cid:4)
be ine(cid:19)cient(cid:8) however(cid:8) for two reasons(cid:4)
First(cid:8) problem sizes for parallel programs are usually large(cid:8) so data is often
distributed among processes(cid:4) Splitting the input data into small parts can
require inspection of all the input data(cid:8) which can be expensive in terms of
data transfer and the amountof memoryrequired for a single process(cid:4) Similar
considerations apply to the output data(cid:4)
Second(cid:8) as can be seen in (cid:2)gure (cid:9)(cid:8) the amount of actual concurrency varies
overthelifetimeofthealgorithm(cid:22)onlyduringthe(cid:23)solve(cid:24)phaseofthealgorithm
are the maximum number N of processes actually used(cid:8) which can lead to
ine(cid:19)ciency if N is large and the tree shown in (cid:2)gure (cid:9) is deep(cid:4)
(cid:0)(cid:2)(cid:3)(cid:2)(cid:0) (cid:4)One(cid:5)deep(cid:6) divide and conquer
Both of these sources of ine(cid:19)ciency can be addressed by a modi(cid:2)cation of the
originaldivide(cid:5)and(cid:5)conquerparadigmcalled(cid:23)one(cid:5)deepdivideandconquer(cid:24)(cid:11)(cid:14)(cid:27)(cid:15)(cid:4)
(cid:25)
First(cid:8) we assume that the input data is originally distributed among pro(cid:5)
cesses(cid:8) and that after execution the output data is to be distributed among
processes as well(cid:4)
Second(cid:8) rather than splitting the problem into a small number of subprob(cid:5)
lems(cid:8)whicharethenrecursivelysplitintosmallersubproblems(cid:8)andsoforth(cid:8)this
version of divide and conquer performs only a single level of split(cid:26)solve(cid:26)merge
(cid:10)hencethename(cid:23)one(cid:5)deep(cid:24)(cid:16)(cid:8)splittingtheoriginalproblemdirectlyintoN sub(cid:5)
problems(cid:8)solvingthemindependently(cid:8) andthen mergingthe N subsolutions to
obtain a solution to the original problem(cid:4) That is(cid:8) the algorithm is structured
as follows(cid:6)
(cid:9)(cid:4) SplitproblemP intoN subproblemsP(cid:0)(cid:2)(cid:3)(cid:3)(cid:3)(cid:2)PN(cid:4) Parameters forthe split
arecomputedusingasmallsampleoftheproblemdata(cid:22)oncetheseparam(cid:5)
eters are computed(cid:8) collecting the data for the subproblems can be done
independently(cid:10)thatis(cid:8)dataforPi canbeextractedfromP independently
of data for Pj(cid:16)(cid:4)
(cid:12)(cid:4) Solve the subproblems (cid:10)independently(cid:16) using a sequential algorithm to
produce subsolutions S(cid:0)(cid:2)(cid:3)(cid:3)(cid:3)(cid:2)SN(cid:4)
(cid:14)(cid:4) Merge the subsolutions into a solution S for P(cid:4) This merge is accom(cid:5)
plished by (cid:2)rst repartitioning the subsolutions (cid:10)based again on parame(cid:5)
ters computed using a smallsampleof data fromallsubsolutions(cid:16) (cid:20) i(cid:4)e(cid:4)(cid:8)
(cid:0) (cid:0)
rearrangingsubsolutionsS(cid:0)(cid:2)(cid:3)(cid:3)(cid:3)(cid:2)SN intoS(cid:0)(cid:2)(cid:3)(cid:3)(cid:3)(cid:2)SN (cid:20)andthen perform(cid:5)
(cid:0)
ing a local merge operation on each of the repartitioned subsolutions Si(cid:4)
Combiningthe results of these local merge operations (cid:10)typically through
concatenation(cid:16) gives the total solution S(cid:4)
For many problems(cid:8) either the split or the merge step is degenerate (cid:20) for
example(cid:8) no split is needed if the data is initially distributed in an acceptable
way amongprocesses(cid:4)
Figure (cid:12) illustrates this pattern(cid:4) The details of how the parameters for the
split or merge are computed can vary among algorithmsand implementations(cid:22)
forexample(cid:8)itcanbedoneinasingleprocess(cid:8)withtheresultsthenmadeknown
to all processes(cid:8) or it can be done via identical computations in all processes
concurrently(cid:4)
(cid:4)(cid:2)(cid:4) Parallelization strategy and data(cid:5)ow
Parallelization of the one(cid:5)deep divide(cid:5)and(cid:5)conquer archetype is a straightfor(cid:5)
ward exploitation of the obvious concurrency(cid:6) During the split phase(cid:8) once the
parametersarecomputed(cid:8)theactualsplitcanbeaccomplishedbyN concurrent
processes(cid:8) each collecting data for one subproblem Pi(cid:4) During the solve phase(cid:8)
allsubproblemscanbesolvedconcurrently(cid:4) Duringthemergephase(cid:8)againonce
the parameters are computed(cid:8)the mergecan be accomplishedby N concurrent
(cid:27)
compute parameters for split
do local do local do local do local
split split split split
split phase
do local do local do local do local
solve phase
solve solve solve solve
compute parameters for merge merge phase
do local do local do local do local
repartition repartition repartition repartition
do local do local do local do local
merge merge merge merge
merge
Figure (cid:12)(cid:6) Parallelizationof one(cid:5)deep divide and conquer(cid:4)
(cid:17)
(cid:0)
processes(cid:8) each collecting and locallymerging data for one subsolution Si(cid:4) De(cid:5)
pending on the details of the algorithm(cid:8)the computationof parameters for the
splitandthemergecanbeeither bedoneentirelysequentially(cid:10)eitherbyhaving
one master process perform the computation and make its results available to
the other processes(cid:8) or by having all processes perform the same computation
concurrently(cid:16)(cid:8) or it can be done by (cid:2)rst independently computing N sets of
local parameters and then mergingthem sequentially(cid:4)
Thearchetype(cid:7)s data(cid:3)owpattern arisesfromcombiningtheabovecomputa(cid:5)
tionalpattern with the desired distribution of problemdata (cid:10)input data at the
startofthe computationandoutputdataatthe end ofthe computation(cid:16)(cid:4) Typ(cid:5)
ically(cid:8) the input and(cid:26)or output data is structured as a one(cid:5)dimensional array(cid:8)
in which the array elements can be scalars (cid:10)as in mergesort(cid:16) or a more com(cid:5)
plicated data structure (cid:10)as in the skyline problem(cid:16)(cid:8) with elements distributed
amongprocesses(cid:8) withthedetailsofthedistributiondepending ontheproblem(cid:4)
Figure (cid:14) illustrates data(cid:3)ow in the case in which the split phase is degenerate
(cid:10)i(cid:4)e(cid:4)(cid:8) the initial distribution of data among processes is used as the result of
the split(cid:16) and the merge phase consists of concatenation (cid:10)i(cid:4)e(cid:4)(cid:8) the complete so(cid:5)
lution is the concatenation of the subsolutions(cid:16)(cid:4) Analogous data(cid:3)ow patterns
arise for algorithms in which the split phase is nontrivial and the merge phase
is degenerate(cid:4)
(cid:4)(cid:2)(cid:6) Communication patterns
It is straightforward to infer the interprocess communicationrequired for one(cid:5)
deep divide and conquer fromdata(cid:3)ow patterns like the one of (cid:2)gure (cid:14)(cid:6)
(cid:0) All(cid:5)to(cid:5)all communication is needed to redistribute data during the split
and merge phases(cid:8) with every process p sending to every other process q
a distinct portion of its data(cid:4)
(cid:0) Either(cid:10)i(cid:16)acombinationof(cid:23)gather(cid:24)andbroadcastor(cid:10)ii(cid:16)all(cid:5)to(cid:5)allcommu(cid:5)
nicationisneeded before thesequentialpartofthecomputationofsplitor
merge parameters(cid:22) each process that is to perform the computationmust
obtain data fromevery other process(cid:4)
(cid:0) Broadcast communication is needed after the computation of split or
merge parameters if not all processes have performed the computation(cid:4)
(cid:4)(cid:2)(cid:7) Applying the archetype
The one(cid:5)deep divide(cid:5)and(cid:5)conquer archetype is most readily applied to algo(cid:5)
rithmsthat are already in the formof traditionalrecursive divide and conquer(cid:4)
The key to applying the archetype to such an algorithm lies in determining
how to transform the recursive (cid:10)often two(cid:5)way(cid:16) split and merge of the original
algorithminto an N(cid:5)way one(cid:5)deep split and merge(cid:4) It is di(cid:19)cult to automate
(cid:18)
local local local local
problem input
input input input input
do local do local do local do local
solve solve solve solve
local local local local
solution solution solution solution
compute merge parameters
("splitters" for repartitioning)
splitters
split data split data split data split data
(repartition) (repartition) (repartition) (repartition)
local data, split
do local do local do local do local
merge merge merge merge
local local local local
output output output output
problem output
Figure (cid:14)(cid:6) Data(cid:3)ow in one(cid:5)deep divide and conquer with degenerate split and
concatenation merge(cid:4)
(cid:28)
or provide detailed guidelines for this step(cid:8) but the transformation is aided to
some extent by consideration of the overall archetype pattern(cid:8) as is illustrated
by the examples in the remainder of this section(cid:4)
(cid:4)(cid:2)(cid:8) Application example(cid:9) Mergesort
This section presents in some detail the development of a one(cid:5)deep version
of the mergesort algorithm based on the general one(cid:5)deep divide(cid:5)and(cid:5)conquer
archetype and its transformation into a program suitable for execution on a
distributed(cid:5)memorymessage(cid:5)passing computer(cid:4)
(cid:0)(cid:2)(cid:7)(cid:2)(cid:3) Problem description
The problem addressed by the mergesort algorithm is to sort an array into
ascendingorder(cid:4) Insequentialmergesort(cid:8)thesplitphaseofthealgorithmsimply
splits the array into left and right halves(cid:8) the base(cid:5)case solve trivially sorts an
array of a single element(cid:8) and the merge phase of the algorithm merges two
sorted arrays(cid:4)
(cid:0)(cid:2)(cid:7)(cid:2)(cid:0) Archetype(cid:5)based algorithm(cid:8) version (cid:3)
The one(cid:5)deep version of mergesort proceeds as follows(cid:6)
(cid:0) The split phase is degenerate(cid:22) the initialdistribution of data among pro(cid:5)
cesses is taken to be the split(cid:4)
(cid:0) The local solve phase consists of sorting the data in each process locally(cid:8)
using any e(cid:19)cient sequential algorithm(cid:4)
(cid:0) The merge phase proceeds as follows(cid:6)
(cid:9)(cid:4) Usinginformationaboutsome(cid:10)orall(cid:16)locallists(cid:8)computeN(cid:2)(cid:9)split(cid:5)
ters s(cid:0)(cid:2)s(cid:2)(cid:2)(cid:3)(cid:3)(cid:3)(cid:2)sN(cid:2)(cid:0)(cid:8) where si (cid:3) si(cid:3)(cid:0)(cid:4) (cid:10)These are the (cid:23)parameters(cid:24)
ofthemergephase(cid:4) Thereareseveralapproachestocomputingthese
points(cid:22) we do not give details(cid:4)(cid:16)
(cid:12)(cid:4) UsethesesplitterstosplitthelocalsortedlistsintoN sortedsublists(cid:8)
such that elements with values at most si belong to the i(cid:5)th list(cid:4)
(cid:14)(cid:4) Redistributethesesublistsinsuchmannerthatsublistswithelements
with values at mostsi fromall processes are located at process i(cid:4)
(cid:13)(cid:4) In each process(cid:8) merge these sorted sublists(cid:4)
After the algorithm terminates(cid:8) process i has a sorted list whose elements are
larger than the elements of process i(cid:2)(cid:9)(cid:7)s list but smallerthan the elements of
process i(cid:29)(cid:9)(cid:7)s list(cid:4)
It is then straightforward to write down this algorithm(cid:22) (cid:2)gure (cid:13) shows C(cid:5)
like pseudocode(cid:8) using the CC(cid:29)(cid:29) (cid:11)(cid:9)(cid:9)(cid:15) parfor construct to express exploitable
(cid:9)(cid:30)
Description:has been done on methods of exploiting design patterns in program and communication that we capture as a parallel programming archetype, performance optimization e ort is amortized over all programs that t the .. patterns resembling higher-order functions (like that of traditional divide and.