Table Of ContentAn Agglomeration Law for Sorting Networks and its
Application in Functional Programming
Lukas ImmanuelSchiller
Philipps-UniversitätMarburg
FachbereichMathematikundInformatik
[email protected]
In this paper we will present a generalagglomerationlaw for sorting networks. Agglomerationis
acommontechniquewhendesigningparallelprogrammestocontrolthegranularityofthecompu-
tation thereby finding a better fit between the algorithm and the machine on which the algorithm
runs.Usuallythisisdonebygroupingsmallertasksandcomputingthemenblocwithinoneparallel
process.Inthecaseofsortingnetworksthiscouldbedonebycomputingbiggerpartsofthenetwork
withoneprocess. Theagglomerationlawinthispaperpursuesadifferentstrategy:Theinputdatais
groupedandthealgorithmisgeneralisedtoworkontheagglomeratedinputwhiletheoriginalstruc-
tureofthealgorithmremains. Thiswillresultinanewaccessopportunitytosortingnetworkswell-
suited for efficient parallelization on modern multicore computers, computer networks or GPGPU
programming.Additionallythisenablesustousesortingnetworksas(parallelordistributed)merg-
ingstagesforarbitrarysortingalgorithms,therebycreatingnewhybridsortingalgorithmswithease.
Theexpressivenessoffunctionalprogramminglanguageshelpsustoapplythislawtosystematically
constructedsortingnetworks,leadingtoefficientandeasilyadaptablesortingalgorithms. Anappli-
cationexampleisgiven,usingtheEdenprogramminglanguagetoshowtheeffectivenessofthelaw.
Theimplementationiscomparedwithdifferentparallelsortingalgorithmsbyruntimebehaviour.
1 Introduction
With the increased presence of parallel hardware the demand for parallel algorithms increases accord-
ingly. Naturallythisdemandincludessortingalgorithmsasoneofthemostinterestingtasksofcomputer
science. Aparticularly interesting classofsorting algorithms forparallelization istheclass ofoblivious
algorithms. Wewillcall a parallel algorithm oblivious “iff its communication structure and its commu-
nication schemearethesameforallinputsthesamesize”[15].
Sorting networks are the most important representative of the class of oblivious algorithms. They
havebeen aninteresting fieldofresearch since theirintroduction byBatcher[1]in1968andareexperi-
encing a renaissance in GPGPU programming [18]. They are based on comparison elements, mapping
their inputs (a ,a )7→(a′,a′) with a′ = min(a ,a ) and a′ =max(a ,a ) and therefore a′ ≤a′. A
1 2 1 2 1 1 2 2 1 2 1 2
simple graphical representation is shown in Figure1. The arrowhead in the box indicates where the
minimumisoutput.
a1 a′1=min(a1,a2)
↑
a2 a′2=max(a1,a2)
Figure1: Comparisonelement(ascending).
S.SchwarzandJ.Voigtländer(Eds.):29thand30thWorkshops
on(Constraint)LogicProgrammingand24thInternational (cid:13)c LukasSchiller
WorkshoponFunctionaland(Constraint)LogicProgramming ThisworkislicensedundertheCreativeCommons
(WLP’15/’16/WFLP’16). Attribution-Noncommercial-ShareAlikeLicense.
EPTCS234,2017,pp.165–179,doi:10.4204/EPTCS.234.12
166 AgglomerationLawforSortingNetworks
Asimplefunctionaldescriptionofsortingnetworksresultsinarepeatedapplicationofthiscompari-
sonelementfunctionwithfixedindicesforeverystep. Forasequence(a ,...,a )oflengthnthespecific
1 n
stepsarefixed:
(a ,...,a )7→...7→(a˜ ,...,a,...,a ,...,a˜ )7→(a˜ ,...,a′,...,a′,...,a˜ )7→...7→(a′,...,a′)
1 n 1 i j n 1 i j n 1 n
with i6= j. In a specific step a and a are sorted with acomparison element. Ultimatly resulting in the
i j
sortedsequence (a′,...,a′).
1 n
Figure2 shows a simple sorting network for lists of length 4. For every permutation of the input
(a ,...,a )the output (a′,...,a′)is sorted – the comparisons are independent of the data base. Notice
1 4 1 4
the obvious inherent parallelism in the first two steps of the sorting network. The restriction to a fixed
structure ofcomparisons resultsinaneasilypredictable behaviour andeasilydetectable parallelism.
a1 a′1
↑ ↑
a2 a′2
↑
a3 a′3
↑ ↑
a4 a′4
Figure2: Simplesortingnetworkwithcomparison elements. Source: [11]
Some well-known sorting algorithms, for example Bubble Sort [11], can be described as sorting
networks. Especially in the case of recursively constructed sorting networks (e.g. Batcher’s Bitonic
Sort or Batcher’s Odd-Even-Mergesort), with their inherent functional structure, an obviously correct
descriptionofthealgorithmiseasilypossibleinafunctionalprogramminglanguagesuchasHaskell[16].
In practice straightforward implementations of these algorithms often struggle with too fine a gran-
ularity of computation and therefore do not scale well. Agglomerating parts of the algorithm is a com-
mon step in dealing with this problem when designing parallel programmes (compare Foster’s PCAM
method [7]). With recursive algorithms for example it is a common technique to agglomerate branches
of the recursive tree by parallelising only until a specific depth of recursion. With a coarser granularity
thecomputation tocommunication ratioimproves. Acommonagglomeration forsorting networks isto
placeblocksorrowsofcomparison elementsinoneparallel process.
Inthispaperwediscuss adifferent approach. Wewillagglomerate theinputdataandalterthecom-
parison element to work on blocks of data. This approach is not based upon the structure of a specific
sorting network and can therefore be applied to any sorting network. Atthe same time wewill see that
the limited nature of sorting networks is necessary for this modification to be correct. The application
of this transformation will open a different access to sorting networks, allowing easy combination with
othersortingalgorithms. Workingondatastructuresinsteadofsingleelementsleadstoasuitableimple-
mentationformodernmulti-corecomputers,GPGPUconceptsorcomputernetworks. Wewillobtainan
adequate granularity ofcomputation andthewidthofthesortingnetwork cancorrespond withthenum-
ber of processor units. A second layer of traditional agglomeration (e.g. blocks or rows of comparison
elements)isindependently possible.
In Section2 we will discuss which demands are necessary for altered comparison elements to pre-
serve an algorithm’s functionality and correctness. In Section3 an example isgiven showing situations
inwhich theapplication ofthisagglomeration isbeneficial and tests withdifferent approaches are eval-
uated. Section4discusses relatedworkandSection5concludes.
LukasSchiller 167
2 Agglomeration Law for Sorting Networks
In general, sorting networks work on sequences of elements A = (a ,...,a ). Our improvement will
1 n
workwithapartitionofagivensequence. Inthefollowing,wewilluseHaskellnotationandlistsinstead
ofsequences toimprovereadability, eventhough amoregeneral typewouldbepossible.
Theorem1(AgglomerationLawforSortingNetworks). LetA=[a ,...,a ]beasequencewhereatotal
1 n
order “≤” is defined on the elements, c::Orda⇒(a,a)→(a,a) a comparison element as described
beforeand
sN :: ((b,b) → (b,b)) → [b] → [b]
acorrectsortingnetwork,meaningsNcA=A′ withA′=[a′,...,a′]anda′ ≤...≤a′ wherea′,...,a′
1 n 1 n 1 n
is a permutation of a ,...,a and the only operation used by the sorting network is a repeated appli-
1 n
cation of the comparison element with a fixed, data independent structure for a given input size. Then
thereexistsacomparisonelementc′::Orda⇒([a],[a])→([a],[a])withwhichasequenceofsequences
A = [A ,...,A ] with A = [a ,...,a ] can be sorted with the same sorting network. Meaning that
1 n i i1 ini
sN c′A=A′ with A′=[A′,...,A′] and A′ (cid:22)...(cid:22)A′. Where concat A′ is a permutation of concat A
1 n 1 n
and A(cid:22)B means that for two sequences A=[a ,...,a ]and B=[b ,...,b ]every element ofA isless
1 p 1 q
thanorequaltoeveryelementfromB:
A(cid:22)B⇔∀a∈A,∀b∈B:a≤b
Withblocksofdatatheconcatenation oftheelementsofA′ needtobeapermutation oftheconcate-
nationofA,theelementsthemselves(A′,...,A′)donotneedtobeapermutation ofA ,...,A .
1 n 1 n
Notethattheorder relation forblocks ofdata“(cid:22)”defines only apartial order whereas theelements
inside the blocks are totally ordered. To this end we need to specialise the comparison element to deal
with the case of overlapping or encasing blocks and still fulfill all properties necessary for the sorting
networktoworkcorrectly (cf.Figure3).
Ai Ai
a
≤ini aini Aj
a≤i1... Aj ≤... a≤jnj Aia≤ini Ajajnj
a≤jnj a≤i1 ≤... ... ≤...
≤... aj1 ≤ a≤j1
aj1 ai1
(b)overlappingblocks:
(a)orderedblocks: maxA ≤maxA but (c)encasingblocks:
j i
maxAj≤minAior maxAj(cid:2)minAiand maxAj≤maxAiand
maxA ≤minA minA ≤minA orviceversa minA ≤minA orviceversa
i j j i i j
Figure 3: Cases for comparison elements for blocks of data: blocks can be ordered (with order relation
(cid:22)), overlapping or encasing, where overlapping and encasing means that the blocks are not in an order
relation betweenoneanother (meaningneither(cid:22)nor(cid:23)holds).
If for example the input lists overlap (e.g. c′([1,2,3,4],[3,4,5,6])) asimple swap would not fulfill
the requirements. Wewould prefer a result such as ([1,2,3,3],[4,4,5,6]) and therefore A′ can not bea
permutation of A but we expect that every element a from A ,...,A is in A′,...,A′. In the next step
ij 1 n 1 n
wewillinvestigate whichconditions acomparison elementforblocksofdatamustfulfill.
168 AgglomerationLawforSortingNetworks
2.1 Comparisonelement forpartiallyordered blocks oftotallyordered elements
If we want to alter the comparison element while preserving the functionality and correctness of the
sorting network we must understand which information is generated and preserved within a traditional
comparison element. We will therefore investigate the capabilities and limits of comparison elements
fortotallyorderedsequences: Leta ,a ,a1,a2,a1,a2beelementswhereinformationaboutthefollowing
1 2 1 1 2 2
relations havealready beengatheredbythesortingnetwork:
a1≤a ≤a2 and a1≤a ≤a2 (0)
1 1 1 2 2 2
If we do sort a and a with a comparison element (a ,a ) 7→ (a′,a′) we receive new relations (e.g.
1 2 1 2 1 2
a1 ≤ a ⇒ a1 ≤ a′). We will distinguish between direct relations and conditional relations. In this
1 1 1 2
casedirectrelationsrefertoallrelationsresultingdirectlyfromtherelationswhichexistsandareknown
before the application of the comparison element and which involve a ,a ,a′ ora′. We expect the
1 2 1 2
comparison element to be side-effect free and therefore we expect every relation between elements not
touchedbythecomparisonelementtobeunaffectedbyitsapplication. Herethedirectrelationsresulting
from(0)are:
a′ ≤ a′ (1)
1 2
a′ ≤ a2, i∈{1,2} (2)
1 i
a1 ≤ a′, i∈{1,2} (3)
i 2
Ifwehaveadditional information, wegetadditional directrelations. For{i,j}={1,2}:
a1≤a ⇒ a1≤a′ (4)
i j i 1
a ≤a2 ⇒ a′ ≤a2 (5)
j i 2 i
a ≤a ⇒ a1≤a′ ∧a′ ≤a2 (6)
i j i 1 2 j
Wecalltheserelations directrelationsonlyiftheleftsideisalreadyknown.
Definition 1(Valid comparison elements for blocks of data). LetA ,A be sequences withatotal order
1 2
“≤” defined on the elements and c′::Orda⇒([a],[a])→([a],[a]) a block comparison element with
c′(A ,A )=(A′,A′)andA′ (cid:22)A′.
1 2 1 2 1 2
c′ is called valid, iff all elements from A and A which are less than lb=max(min(A ), min(A ))
1 2 1 2
must be in A′, all elements which are greater than ub=min(max(A ), max(A )) must be in A′ and all
1 1 2 2
elements betweentheselimitscanbeeither inA′ orinA′ aslongaseveryelementinA′ issmallerthan
1 2 1
orequaltoeveryelementinA′ (cf.Figure4).
2
u
ub
m
lb
l
Figure 4: Sections of the comparison element for blocks of data. Elements from l must be in the lesser
result(A′),elementsfromumustbeinthegreaterresult(A′)andelementsfrommcanbeineitherresult
1 2
aslongasA′ (cid:22)A′ holds.
1 2
LukasSchiller 169
Lemma 1. Valid block comparison elements maintain the direct relations fulfilled by the elementary
comparison elements.
Proof. Weshowthevalidityoftheaboverelations (1)to(6)forblocksofdata:
1. A′ (cid:22)A′ isincluded inthedefinition.
1 2
2. maxA′ ≤ub≤maxA ≤minA2⇒A′ (cid:22)A2, i∈{1,2}
1 i i 1 i
3. maxA1≤minA ≤lb≤minA′ ⇒A1(cid:22)A′, i∈{1,2}
i i 2 i 2
4. A1(cid:22)A ∧A1(cid:22)A ⇒A1(cid:22)[min(minA , minA )](cid:22)A′ ⇒A1(cid:22)A′, i∈{1,2}
i 1 i 2 i 1 2 1 i 1
5. A (cid:22)A2∧A (cid:22)A2⇒A′ (cid:22)[max(maxA , maxA )](cid:22)A2⇒A′ (cid:22)A2, i∈{1,2}
1 i 2 i 2 1 2 i 2 i
6. A1(cid:22)A (cid:22)A ⇒A1(cid:22)A ∧A1(cid:22)A ⇒maxA1≤min(minA , minA )⇒A1(cid:22)A′
i i j i i i j i i j i 1
A (cid:22)A (cid:22)A2⇒A (cid:22)A2∧A (cid:22)A2⇒max(maxA , maxA )≤minA2⇒A′ (cid:22)A2
i j j i j j j i j j j j
Theproofshowsthattheselimitsarenotonlysufficientbutnecessarytoguaranteethedirectrelations
onwhichsortingnetworksareessentiallybased. Counterexampleswhereadifferentlimitselectionleads
tothefailureofthesortingnetworkcaneasilybefound.
All other producible information concerns conditional relations which depend on a yet unknown
condition resulting inadisjunction oraconditional withunknownantecedent. Forexample
(a ≤a ∨a ≤a )∧a1≤a ≤a2⇒a1≤a′ ∨a′ ≤a2
1 2 2 1 1 1 1 1 1 2 1
Fororderedoroverlapping blockswecaneasilyverifythatalltheserelations canbepreserved, asevery
inputelementhasadirectdescendant,analogoustotheoriginalcomparisonelement. Inthiscaseadirect
descendant A′ ofablockAisboundedbytheextremaoftheparentalblock,meaningthatminA≤minA′
andmaxA′≤maxA. A′ canbutneednotcontain elements fromAaswellaselements whicharenotinA
due to the fact that the property is defined through boundaries not elements. Therefore, when applying
the comparison element, the boundaries of each block can at the most approach each other, leaving all
relations preserved. AnexampleisgiveninFigure5.
A2 A2
1 1
A2 A2
2 2
A A′
1 2
max(A )
1 max(A )
2
(cid:22)
min(A1) min(A2)
A2 A′1
A11 A12 A11 A12
Figure 5: Splitof overlapping blocks. Inthis case the minimal(maximal) element ofA issmaller than
2
the minimal (maximal) element of A . Thereby A “shrinks” from above, meaning that the maximum
1 2
element of A′ is smaller than maxA . This does not yet disclose any information about the number of
1 2
elementsinA′. A “shrinks”frombelow. WecanseeA′ asthedescendantofA andA′ asthedescendant
1 1 1 2 2
ofA . Allrelations arepreserved.
1
170 AgglomerationLawforSortingNetworks
With encased blocks (cf. Figure3c) it is not necessarily possible to find a descendant for every
element. If,forexample, wehaveA1(cid:22)A (cid:22)A2 andA1(cid:22)A (cid:22)A2 there mightbenooutput element A′
1 1 1 2 2 2 i
withA1(cid:22)A′(cid:22)A2 (cf.Figure6).
1 i 1
A2 A2
1 1
A′
2
A
1
(cid:22)
A2 A′1
A1 A1
1 1
Figure 6: Splitofencased blocks. Therearenodirect descendants. A′ (cid:22)A2 andA1(cid:22)A′ butneither A′
1 1 1 2 1
norA′ isbetweenA1 andA2.
2 1 1
Unfortunately, a consequence of this is that this technique of merging and splitting blocks can not
necessarilybetransferredtoamoregeneralsortingalgorithm. Inparticularthisdoesnotworkwithpivot
based sorting algorithms. However it does with sorting networks because the comparison element does
not compare one fixed element with another element but rather returns two sorted elements for which
we do not know which input element is mapped to which output element. The information A1 ≺ A
1 1
is reduced to A1 (cid:22) A′ plus some conditional information. Some of this conditional information can
1 2
no longer be guaranteed to hold but can not be used in a sorting network at all because of the limited
operations ofsortingnetworks. Therelations ofconcernare
a ≤a ∨a ≤a ⇒a1≤a′ ∨a′ ≤a2, i∈{1,2} (7)
1 2 2 1 i 1 2 i
Sorting networks as described above can not produce the additional information needed for this
conditional information tobecomeuseful.
Lemma 2. Information about the conditional relations (7) that can not be preserved by the altered
comparison elementc′ cannotbeusedbyasortingnetwork.
SketchofProof. The condition of a conditional relation is unknown by definition – otherwise it would
be a direct relation. Therefore the implication can not be used to gather additional information. The
remaining disjunction can result in useful information in a non-trivial way only if one side of the dis-
junction isknown tobe false (modus tollendo ponens) orifboth sides ofthe disjunction areequal. Itis
notpossibletoequaliseanoutputelementofthecomparisonelementwithanotherelementandtherefore
it is not possible to test whether a ≤a or not. In particular the information that a (cid:2)a can not be
i j i j
produced for any i and j. We can not test whether one side of such an equation is false or if both sides
areequalandthereforecannotuseconditional relations. Non-trivial,productiveinformationfromthese
disjunctions canonlybeusedinnon-oblivious algorithms.
Lemma1 and Lemma2 imply that a comparison element c′ as demanded in Theorem1 exists with
thegivenlimitationsfromLemma1. Thereforeallusableinformation ispreservedandthistechnique of
mergingandsplitting twoblocksinacomparison elementcanbeusedwitheverysortingnetwork.
Iftheelementsinsidetheblocksaresorted,wecandefinealineartimecomparisonelementthatsplits
thetwoblocksintoblocksasequalinsizeaspossible. Animplementationofsuchacomparisonelement
LukasSchiller 171
can be found in Section3, Listing5. Balancing the blocks is advantageous in many cases because it
limits the maximal block size to the size of the largest block in the initial sequence. This is beneficial
especiallyinthesituationoflimitedmemoryfordifferentpartsoftheparallelisedalgorithm,forexample
if the parallelization is done with a computer cluster. By preserving the inner sorting of the blocks, the
resulting sequence of the sorting network can be easily combined to a completely sorted sequence by
concatenation.
Every suitable sorting algorithm can be used for the initial sorting inside the blocks. Consequently
the sorting network can beused as askeleton toparallelise arbitrary sorting algorithms and workas the
merging stage of the newly combined (parallel) algorithm. A concept that will prove it’s worth in the
followingexample.
3 Application of the Agglomeration Law on the Bitonic Sorter
We will now apply the agglomeration law to Batcher’s Bitonic Sorting Network. It is a recursively
constructed sorting network that works in two steps. In the first step an unsorted sequence (of length
2l with l ∈ N) is transformed into a bitonic sequence. A bitonic sequence is the juxtaposition of an
ascending andadescending monotonic sequence orthecyclicrotationofthefirstcase(Figure7).
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
Figure7: Examplesofbitonicsequences.
ThebitonicsequenceisthereaftersortedbyaBitonicMerger. Wewillcallthefunctionimplementing
thisBitonicMergerbMergeandthefunctiontransforming anunsorted sequence intoabitonic sequence
prodBList. The Bitonic Sorter works with the nested divide-and-conquer scheme of the sorting-by-
merging idea. This means that the repeated generation of shorter sorted lists is done by Bitonic Sorters
ofsmallersize. ABitonicSorterforeightinputelementsisdepicted inFigure8.
↑ ↑ ↑ ↑ ↑ ↑
e
c e
n c
e ↓ ↑ ↑ ↑ ↑ ↑ n
u e
q u
e q
s e
d s
e d
ort ↑ ↓ ↓ ↑ ↑ ↑ rte
s o
n s
u
↓ ↓ ↓ ↑ ↑ ↑
Figure 8: Bitonic Sorter of order 8. The function prodBList is represented by a red dashed rectangle,
thefunction bMergebyabluedottedone. BitonicSequencesarerepresented byshadedrectangles.
172 AgglomerationLawforSortingNetworks
Thebasiccomponent ofthesortingnetwork–theoriginal comparisonelement–canbedefinedas:
Listing1: Originalcomparison element
data Direction = Up | Down deriving Show
compElem :: Ord a ⇒ Direction → [a] → [a]
compElem Up [x,y] = if x ≤ y then [x,y] else [y,x]
compElem Down xs = reverse $ compElem Up xs
We will use a two-element-list variant instead of pairs for reasons of code elegance. We define the
actual algorithm using the Eden [14, 13] programming language which extends Haskell by the concept
of parallel processes with an implicit communication as well as a Remote Data [5] concept. We can
instantiate aprocessthatisdefinedbyagivenfunctionwith($#):
($#) :: (Trans b, Trans a) ⇒
(a → b) -- Process function
→ a → b -- Process input and output
The class Trans consists of transmissible values. The expression f $# expr with some function
f :: a → bwillcreatea(remote)childprocess. Theexpression exprwillbeevaluated(concurrently
by a new thread) in the parent process and the result val will be sent to the child process. The child
process willevaluatef $ val(cf.Figure9).
parentprocess resultof
(f $ val)
(evaluatesexprtoval)
creates
childprocess
val
release ◦ f
Figure9: Theschemeforprocessinstantiation. Source: [13]
Hereafterwewillessentially useEden’sparMapAt,aparallelvariantofmapwithexplicitplacement
ofprocesses onprocessor elements(PEs),alsocalled(logical) machines,whicharenumberedfrom1to
thenumberofprocessor elements.
parMapAt :: (Trans a, Trans b) ⇒
[Int] -- ^places for instantiation
→ (a → b) -- ^worker function
→ [a] → [b] -- ^task list and ^result list
Theexplicitplacementisdeterminedbythefirstargument,alistofPEnumbersspecifyingtheplaces
wheretheprocesseswillbedeployed. AdditionallywewillusetheconstantsnoPeandselfPeprovided
byEdentocalculate thecorrectplacements:
noPe :: Int -- Number of (logical) machines in the system
selfPe :: Int -- Local machine number (ranges from 1 to noPe)
Forourimplementation wewillplaceeachcomparison element ofthesamerowonthesamePE.In
Listing2aparalleldefinitionofthealgorithm isgiven.
LukasSchiller 173
Listing2: ParallelbSort
36 bSort :: Trans a
37 ⇒ (Direction → [a] → [a]) -- ^specialized comparison element
38 → Direction -- ^sorting direction
39 → [a] → [a] -- ^input and ^output
40 bSort _ _ [ ] = [ ]
41 bSort _ _ [x] = [x]
42 bSort sCompElem d xss = (bMerge sCompElem d) ◦ prodBList $ xss where
43 prodBList = unSplit ◦ pMap bSort’ ◦ zip [Up, Down] ◦ splitHalf
44 bSort’ = uncurry (bSort sCompElem)
45 pMap = parMapAt [selfPe, selfPe+hcc]
46 hcc = (length xss) ‘div‘ 4 {- half comparator count -}
The bSort function takes three arguments: an oriented comparison element, a Direction denoting
whethertheresultshouldbesortedinanascendingordescendingorderandaninputlist. Themainpartof
thealgorithmisacompositionoftheprodBListandthebMergefunction(cf.Line42inListing2). The
prodBListfunction splits theinput list andsorts both parts withthe Bitonic Sorter, one half ascending
and one half descending (cf. Line43). It uses twohelper functions splitHalfand unSplit. With the
helpofEden’ssplitIntoN,whichsplitstheinputlistblockwiseintoasmanypartsasthefirstparameter
determines, wedefine:
splitHalf :: [a] → [[a]]
splitHalf = splitIntoN 2
BothresultinglistsareofthesamesizebecausethewidthoftheBitonicSorterandthereforeitsinput
list’s length are powers of two(not to be confused with the size of the blocks which can be of arbitrary
size). Theneededreversefunction–unSplit–canbedefinedas:
unSplit :: [[a]] → [a]
unSplit = concat
The correct placement by line is calculated depending on the width of the sorting network (cf.
Line46). Two elements are needed for every comparison element, therefore hcc is half the size of
thesortingnetworkintheactualrecursion step. ThebMergefunctiondoeshavethesametypesignature
asthebSortfunction buttheinputlistmustbeabitonic listforthefunction toworkcorrectly:
Listing3: ParallelbMerge
49 bMerge :: Trans a
50 ⇒ (Direction → [a] → [a]) -- ^specialized comparison element
51 → Direction -- ^sorting direction
52 → [a] → [a] -- ^input and ^output
53 bMerge sCompElem d xss@[x,y] = sCompElem d xss
54 bMerge sCompElem d xss = unSplit ◦ pMap (bMerge sCompElem d) ◦ bSplit $ xss where
55 bSplit = splitHalf ◦ shuffle ◦ pMap’ (sCompElem d) ◦ perfectShuffle
56 pMap = parMapAt [selfPe, selfPe+hcc]
57 hcc = (length xss) ‘div‘ 4 {- half comparator count -}
58 pMap’ = parMapAt [selfPe..]
The main part of the bMerge function is the function bSplit which splits a bitonic sequence into
two bitonic sequences with an order between each other. This function uses a communication structure
referred toas aperfect shuffle1 byStone[21]. Withthis communication scheme the element iandi+ p
2
arecompared, resulting inasplitdepictedinFigure10.
1Thisstructurecanbefoundinvariousalgorithms,e.g.intheFastFouriertransformorinmatrixtranspositions.
174 AgglomerationLawforSortingNetworks
1 p p 1 p p 1 p p 1 p p
2 2 2 2
Figure10: Conceptofsplitting abitonicsequence.
InEdenthisperfect shuffleiseasilydefinedwiththehelpoftheofferedauxiliary functions:
-- Round robin distribution and inverse function called shuffle
unshuffle :: Int → [a] → [[a]]
shuffle :: [[a]] → [a]
Thefirstparameterofunshufflespecifiesthenumberofsublists inwhichthelistissplit,e.g.:
unshuffle 3 [1..10] = [[1,4,7,10],[2,5,8],[3,6,9]]
shuffle [[1,4,7,10],[2,5,8],[3,6,9]] = [1..10]
Theperfect shuffleisthendefinedas:
perfectShuffle :: [a] → [[a]]
perfectShuffle xs = unshuffle halfSize xs
where halfSize = (length xs) ‘div‘ 2
Adirect communication between consecutive comparison elements canberealised withEden’sRe-
mote Data concept in which a smaller handle is transmitted instead ofthe actual data. Thedata itself is
fetched directly when needed from the PE where the handle was created. The more intermediate steps
areinvolved,themoreeffectivethebenefitsofthisconceptbecome. Thiscanbedonebytheoperations:
type RD a -- remote data
-- converts local data into corresponding remote data and vice versa
release :: Trans a ⇒ a → RD a
fetch :: Trans a ⇒ RD a → a
releaseAll :: Trans a ⇒ [a] → [RD a] -- list variants
fetchAll :: Trans a ⇒ [RD a] → [a]
InFigure11thecommunication schemeofaRemoteDataconnection ispictured.
PE0 PE0
inp inp
PE1 f g PE2 PE1 PE2
release ◦ f release ◦ f release ◦ f g ◦ fetch
(a)Indirectconnection. (b)Directconnection.
(g $# (f $# inp)) (g ◦ fetch) $# ((release ◦ f) $# inp)
Figure 11: Remote Data scheme. Source: [13]. The processes computing the results of f and g are
placedontwodifferentPEs. WithoutRD,theresultoffistransferredviatheparentalprocess. WithRD
ahandleisgenerated onPE1andtransferred viaPE0toPE2,theactualresultistransferred directly.