Table Of ContentEmbedded Systems
SeriesEditors
NikilD.Dutt
PeterMarwedel
GrantMartin
Forfurthervolumes:
http://www.springer.com/series/8563
·
Andreas Hansson Kees Goossens
On-Chip Interconnect
with Aelite
Composable and Predictable Systems
123
AndreasHansson KeesGoossens
Research&DevelopmentARMLtd. EindhovenUniversityofTechnology
Cambridge,UnitedKingdom Eindhoven,TheNetherlands
[email protected] [email protected]
ISBN978-1-4419-6496-0 e-ISBN978-1-4419-6865-4
DOI10.1007/978-1-4419-6865-4
SpringerNewYorkDordrechtHeidelbergLondon
LibraryofCongressControlNumber:2010937102
(cid:2)c SpringerScience+BusinessMedia,LLC2011
Allrightsreserved.Thisworkmaynotbetranslatedorcopiedinwholeorinpartwithoutthewritten
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,
NY10013,USA),exceptforbriefexcerptsinconnectionwithreviewsorscholarlyanalysis.Usein
connection with any form of information storage and retrieval, electronic adaptation, computer
software,orbysimilarordissimilarmethodologynowknownorhereafterdevelopedisforbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or not
theyaresubjecttoproprietaryrights.
Printedonacid-freepaper
SpringerispartofSpringerScience+BusinessMedia(www.springer.com)
Contents
1 Introduction ................................................... 1
1.1 Trends ................................................... 1
1.1.1 ApplicationRequirements ........................... 1
1.1.2 ImplementationandDesign.......................... 3
1.1.3 TimeandCost..................................... 4
1.1.4 Summary ......................................... 5
1.1.5 ExampleSystem ................................... 6
1.2 Requirements ............................................. 9
1.2.1 Scalability ........................................ 10
1.2.2 Diversity ......................................... 10
1.2.3 Composability..................................... 11
1.2.4 Predictability...................................... 13
1.2.5 Reconfigurability .................................. 14
1.2.6 Automation ....................................... 15
1.3 KeyComponents .......................................... 16
1.4 Organisation.............................................. 18
2 ProposedSolution .............................................. 19
2.1 ArchitectureOverview ..................................... 19
2.1.1 Contention-FreeRouting ............................ 21
2.2 Scalability................................................ 22
2.2.1 PhysicalScalability................................. 23
2.2.2 ArchitecturalScalability............................. 23
2.3 Diversity ................................................. 24
2.3.1 NetworkStack..................................... 25
2.3.2 StreamingStack ................................... 25
2.3.3 Memory-MappedStack ............................. 26
2.4 Composability ............................................ 28
2.4.1 ResourceFlow-ControlScheme ...................... 28
2.4.2 FlowControlandArbitrationGranularities............. 29
2.4.3 ArbitrationUnitSize ............................... 32
2.4.4 TemporalInterference .............................. 32
v
vi Contents
2.4.5 Summary ......................................... 33
2.5 Predictability ............................................. 33
2.5.1 ArchitectureBehaviour ............................. 34
2.5.2 ModellingandAnalysis ............................. 34
2.6 Reconfigurability .......................................... 35
2.6.1 SpatialandTemporalGranularity ..................... 35
2.6.2 ArchitecturalSupport............................... 37
2.7 Automation............................................... 37
2.7.1 InputandOutput................................... 38
2.7.2 DivisionintoTools ................................. 38
2.8 Conclusions .............................................. 39
3 Dimensioning .................................................. 41
3.1 LocalBuses .............................................. 41
3.1.1 TargetBus ........................................ 41
3.1.2 InitiatorBus....................................... 44
3.2 Atomisers ................................................ 46
3.2.1 Limitations........................................ 47
3.3 ProtocolShells............................................ 47
3.3.1 Limitations........................................ 49
3.4 ClockDomainCrossings ................................... 49
3.5 NetworkInterfaces......................................... 50
3.5.1 Architecture....................................... 51
3.5.2 ExperimentalResults ............................... 54
3.5.3 Limitations........................................ 55
3.6 Routers .................................................. 56
3.6.1 ExperimentalResults ............................... 58
3.6.2 Limitations........................................ 60
3.7 MesochronousLinks ....................................... 60
3.7.1 ExperimentalResults ............................... 62
3.7.2 Limitations........................................ 62
3.8 ControlInfrastructure ...................................... 62
3.8.1 UnifiedControlandData............................ 63
3.8.2 ArchitecturalComponents........................... 64
3.8.3 Limitations........................................ 67
3.9 Conclusions .............................................. 67
4 Allocation ..................................................... 69
4.1 SharingSlots ............................................. 73
4.2 ProblemFormulation....................................... 76
4.2.1 ApplicationSpecification............................ 76
4.2.2 NetworkTopologySpecification...................... 79
4.2.3 AllocationSpecification............................. 81
4.2.4 ResidualResourceSpecification...................... 82
Contents vii
4.3 AllocationAlgorithm ...................................... 84
4.3.1 ChannelTraversalOrder ............................ 85
4.3.2 SpeculativeReservation............................. 86
4.3.3 PathSelection ..................................... 89
4.3.4 RefinementofMapping ............................. 93
4.3.5 SlotAllocation .................................... 93
4.3.6 ResourceReservation............................... 97
4.3.7 Limitations........................................ 98
4.4 ExperimentalResults....................................... 99
4.5 Conclusions ..............................................101
5 Instantiation ...................................................103
5.1 Hardware ................................................104
5.1.1 SystemCModel....................................105
5.1.2 RTLImplementation ...............................106
5.2 Allocations ...............................................107
5.3 Run-TimeLibrary .........................................108
5.3.1 Initialisation.......................................109
5.3.2 OpeningaConnection ..............................111
5.3.3 ClosingaConnection ...............................113
5.3.4 TemporalBounds ..................................115
5.4 ExperimentalResults.......................................115
5.4.1 SetupTime........................................116
5.4.2 MemoryRequirements..............................117
5.4.3 Tear-DownTime...................................118
5.5 Conclusions ..............................................119
6 Verification ....................................................121
6.1 ProblemFormulation.......................................124
6.1.1 Cyclo-staticDataflow(CSDF)Graphs .................125
6.1.2 BufferCapacityComputation ........................127
6.2 NetworkRequirements .....................................128
6.3 NetworkBehaviour ........................................129
6.3.1 SlotTableInjection.................................129
6.3.2 HeaderInsertion ...................................130
6.3.3 PathLatency ......................................131
6.3.4 ReturnofCredits...................................131
6.4 ChannelModel............................................132
6.4.1 FixedLatency .....................................132
6.4.2 SplitLatencyandRate ..............................134
6.4.3 SplitDataandCredits...............................134
6.4.4 FinalModel.......................................134
6.4.5 ShellModel.......................................135
6.5 BufferSizing .............................................135
viii Contents
6.5.1 ModellingtheApplication...........................136
6.5.2 SyntheticBenchmarks ..............................137
6.5.3 MobilePhoneSoC .................................139
6.5.4 Set-TopBoxSoC ..................................139
6.6 Conclusions ..............................................140
7 FPGACaseStudy ..............................................143
7.1 HardwarePlatform.........................................144
7.1.1 HostTile .........................................145
7.1.2 ProcessorTiles ....................................146
7.2 SoftwarePlatform .........................................147
7.2.1 ApplicationMiddleware.............................147
7.2.2 DesignFlow ......................................148
7.3 ApplicationMapping.......................................149
7.4 PerformanceVerification....................................151
7.4.1 SoftReal-Time ....................................151
7.4.2 FirmReal-Time....................................152
7.5 Conclusions ..............................................154
8 ASICCaseStudy ...............................................157
8.1 DigitalTV................................................157
8.1.1 ExperimentalResults ...............................159
8.1.2 ScalabilityAnalysis ................................162
8.2 AutomotiveRadio .........................................165
8.2.1 ExperimentalResults ...............................166
8.2.2 ScalabilityAnalysis ................................167
8.3 Conclusions ..............................................168
9 RelatedWork ..................................................171
9.1 Scalability................................................171
9.1.1 PhysicalScalability.................................171
9.1.2 ArchitecturalScalability.............................172
9.2 Diversity .................................................173
9.3 Composability ............................................174
9.3.1 LevelofComposability .............................175
9.3.2 EnforcementMechanism ............................175
9.3.3 Interference .......................................176
9.4 Predictability .............................................177
9.4.1 EnforcementMechanism ............................177
9.4.2 ResourceAllocation ................................177
9.4.3 AnalysisMethod...................................178
9.5 Reconfigurability ..........................................178
9.6 Automation...............................................179
Contents ix
10 ConclusionsandFutureWork ...................................181
10.1 Conclusions ..............................................181
10.2 FutureWork ..............................................183
A ExampleSpecification ..........................................185
A.1 Architecture ..............................................186
A.2 Communication ...........................................187
References.........................................................191
Glossary ..........................................................201
Index .............................................................205
Chapter1
Introduction
Embedded systems are rapidly growing in numbers and importance as we crowd
ourlivingroomswithdigitaltelevisions,gameconsolesandset-topboxesandour
pockets (or maybe handbags) with mobile phones, digital cameras and personal
digital assistants. Even traditional PC and IT companies are making an effort to
entertheconsumer-electronicsbusiness[5]withamobilephonemarketthatisfour
times larger than the PC market (1.12 billion compared to 271 million PCs and
laptopsin2007)[177].Embeddedsystemsroutinelyofferarichsetoffeatures,do
soataunitpriceofafewUSdollars,andhaveanenergyconsumptionlowenough
to keep portable devices alive for days. To achieve these goals, all components of
thesystemareintegratedonasinglecircuit,aSystemonChip(SoC).Asweshall
see,oneofthecriticalpartsinsuchaSoC,andthefocusofthiswork,istheon-chip
interconnectthatenablesdifferentcomponentstocommunicatewitheachother.
Inthischapter,westartbylookingattrendsinthedesignandimplementationof
SoCs inSection 1.1.Wealsointroduce our example systemthatserves todemon-
stratethetrendsandistherunningexamplethroughoutthiswork.Thisisfollowed
byanoverviewofthekeyrequirementsinSection1.2.Finally,Section1.3liststhe
keycomponentsofourproposedsolutionandSection1.4providesanoverviewof
theremainingchapters.
1.1 Trends
SoCsgrowincomplexityasanincreasingnumberofindependentapplicationsare
integrated on a single chip [9, 50, 55, 146, 177]. In the area of portable consumer
systems,suchasmobilephones,thenumberofapplicationsdoublesroughlyevery
2years,andtheintroductionofnewtechnologysolutionsisincreasinglydrivenby
applications[80,88].Withincreasingapplicationheterogeneity,system-levelcon-
straints become increasingly complex and application requirements, as discussed
next,becomemoremultifaceted[152].
1.1.1 Application Requirements
Applications can be broadly classified into control-oriented and signal-processing
(streaming) applications. For the former, the reaction time is often critical [144].
A.Hansson,K.Goossens,On-ChipInterconnectwithAelite,EmbeddedSystems, 1
DOI10.1007/978-1-4419-6865-4_1,(cid:2)C SpringerScience+BusinessMedia,LLC2011
2 1 Introduction
Performance gains mainly come from higher clock rates, more deeply pipelined
architectures and instruction-level parallelism. Control-oriented applications fall
outside the scope of this work and are not discussed further. Signal-processing
applications often have real-time requirements related to user perception [144],
e.g.videoandaudiocodecs,orrequirementsdictatedbystandardslikeDVB,DAB
and UMTS [87, 131]. For signal-processing applications, an increasing amount of
datamustbeprocessedduetogrowingdatasets,i.e.highervideoresolutions,and
increasingworkforthedatasets,i.e.moreelaborateandcomputationallyintensive
coding schemes [144]. As a result, the required processing power is expected to
increaseby1000timesinthenext10years[21]andthegapbetweentheprocess-
ing requirement and the available processing performance of a single processor is
growingsuper-linearly[88].
Delivering a certain performance is, however, not enough. It must also be
performed in a timely manner. The individual applications have different real-
time requirements [33]. For firm real-time applications, e.g. a Software-Defined
Radio [131] or the audio post-processing filter, illustrated in Fig. 1.1, deadline
misses are highly undesirable. This is typically due to standardisation, e.g. upper
bounds on the response latency in the aforementioned wireless standards [87], or
perception,e.g.steepqualityreductioninthecaseofmisses.Notethatfirmreal-time
onlydiffersfromhardreal-time,atermwidelyusedintheautomotiveandaerospace
domain,inthatitdoesnotinvolvesafetyaspects.Softreal-timeapplications,e.g.a
video decoder, can tolerate occasional deadline misses with only a modest quality
degradation. In addition, non-real-time applications have no requirements on their
temporalbehaviour,andmustonlybefunctionallycorrect.
use-case
M-JPEG application
task
output stream
to display
input stream
task task
audio post-processing
MPEG-1 application
application
output stream
input stream to speakers
task task task
Fig.1.1 Applicationmodel
Eachapplicationhasitsownsetofrequirements,buttheSoCtypicallyexecutes
many applications concurrently, as exemplified by Fig. 1.1. Furthermore, applica-
tions are started and stopped at run time by the user, thus creating many different
use-cases, i.e. combinations of concurrent applications [72, 138]. The number of
use-casesgrowsroughlyexponentiallyinthenumberofapplications,andforevery