Table Of ContentPERFORMANCE ANALAYSIS OF POLYMORPHOUS COMPUTING ARCHI-
TECTURES
by
Sumit Lohani
Thesis submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Master of Science
2001
Advisory Committee:
Dr. Shuvra S. Bhattacharyya, Chair
Dr. Donald Yeung
Dr. Gang Qu
Report Documentation Page Form Approved
OMB No. 0704-0188
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and
maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,
including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington
VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it
does not display a currently valid OMB control number.
1. REPORT DATE 3. DATES COVERED
2001 2. REPORT TYPE 00-00-2001 to 00-00-2001
4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER
Performance Analysis of Polymorphous Computing Architectures
5b. GRANT NUMBER
5c. PROGRAM ELEMENT NUMBER
6. AUTHOR(S) 5d. PROJECT NUMBER
5e. TASK NUMBER
5f. WORK UNIT NUMBER
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION
University of Maryland,Department of Electrical and Computer REPORT NUMBER
Engineering,Institute for Advanced Computer Studies,College
Park,MD,20742
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)
11. SPONSOR/MONITOR’S REPORT
NUMBER(S)
12. DISTRIBUTION/AVAILABILITY STATEMENT
Approved for public release; distribution unlimited
13. SUPPLEMENTARY NOTES
14. ABSTRACT
15. SUBJECT TERMS
16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF
ABSTRACT OF PAGES RESPONSIBLE PERSON
a. REPORT b. ABSTRACT c. THIS PAGE 112
unclassified unclassified unclassified
Standard Form 298 (Rev. 8-98)
Prescribed by ANSI Std Z39-18
ABSTRACT
Title of Thesis: PERFORMANCE ANALYSIS OF POLYMORPHOUS
COMPUTING ARCHITECTURES
Degree Candidate: Sumit Lohani
Degree and year: Master of Science, 2001
Thesis directed by: Dr. Shuvra S. Bhattacharyya
Department of Electrical and Computer Engineering, and
Institute of Advanced Computer Studies,
University of Maryland at College Park
Ingeneral,polymorphouscomputingarchitecturesarearchitecturesthatcan
be dynamically customized to various applications. This thesis is concerned with
metrics, formulations, and algorithms for systematically synthesizing architectural
configurationsandsoftwareforsucharchitectures,particularlyinthedomainofdig-
ital signal processing (DSP).
In polymorphous system synthesis for DSP, one central aspect is managing
trade-offsbetweenlatencyandthroughput,whicharecriticalmetricsforDSPappli-
cations. In Chapters 1-4 of the thesis, we develop a model for latency that is more
appropriatethanconventionalmodelsoflatencyforDSPsystemsynthesis,andthat
takes into account central issues related to transient-state time, and we develop pre-
cise relationships between latency and throughput using this framework. Schedule
post-processing strategies based on simulation and retiming that reduce the latency
and transient for a given throughput constraint are then presented, along with an
approachbasedongraph-theoreticframeworktostreamlinethem,andtheirefficacy
is substantiated with experimental results.
Chapters5-6ofthethesisthendealwiththeproblemofefficientmappingof
an application with stochastic execution times to a polymorphous computing archi-
tecture in accordance with the time-varying performance requirements for several
metrics, which may include even non-trivial metrics such as the coupled latency/
throughput metrics. A comprehensive model for system synthesis in this context is
developed; results are developed on the complexity of system synthesis under this
model;andpreliminaryalgorithmsthataddressthesynthesisproblemarepresented,
and evaluated experimentally.
i
ACKNOWLEDGEMENTS
I thank my advisor, Dr. Shuvra S. Bhattacharyya for all his guidance, enthu-
siasm and motivation throughout my thesis work. I would also like to thank my lab
matesforprovidinganatmosphereconducivetostudy,inthelab.Thisresearchhas
beensupportedbytheUSNationalScienceFoundation(9734275),andtheDefense
AdvancedResearchProjectsAgencyunderContractF30602-01-C-0171throughthe
University of Southern California Information Sciences Institute.
ii
TABLE OF CONTENTS
LIST OF TABLES v
LIST OF FIGURES vi
Chapter 1. Introduction 1
1.1. Definitions and notation ......................................................................2
1.2. Latency in literature .............................................................................7
1.3. Overview ...........................................................................................17
Chapter 2. Background 20
Chapter 3. Algorithm and experimental results 23
3.1. Phase 1 ...............................................................................................24
3.2. Phase 2 ...............................................................................................24
3.2.1 Transient-reduction scheme........................................................24
3.2.2 Latency-reduction scheme...........................................................33
3.3. Summary ............................................................................................43
Chapter 4. Streamlining latency analysis for
contention-free systems 44
4.1. Notation and analysis ........................................................................45
4.2. Results ...............................................................................................48
Chapter 5. Problem formulation and overview of model 50
5.1. Problem formulation ..........................................................................52
5.2. Model .................................................................................................57
Chapter 6. Details of the model 58
6.1. Evaluation measure ...........................................................................58
6.2. Configuration store ............................................................................60
6.2.1 Terminology and notation...........................................................62
6.2.2 On-line management and use of configurations..........................69
6.2.3 Models for problems related to configuration management.......72
6.3. On-line algorithms .............................................................................84
6.3.1 Simple low-complexity, run-time refinement approach.............84
6.3.2 Throughput optimization.............................................................85
iii
6.3.3 Power optimization.....................................................................88
6.3.4 Experimental results....................................................................90
Chapter 7. Conclusion and future work 95
References 99
iv
LIST OF TABLES
Table 1. Results of the algorithm described in Figure 11 for various benchmarks
with deterministic execution times................. 30
Table 2. Results for latency-reduction using a method similar to transient-reduc-
tion approach for FFT graphs with non-determin-
istic execution times....................................... 35
Table 3. Results of applying latencyReduction to non-deterministic graphs.... 39
Table 4. Speedups for various benchmarks....................................................... 49
Table 5. Result of applying algorithm in Figure 27 on various DSP benchmarks....
........................................................................91
Table 6. ResultsofapplyingpoweroptimizationalgorithmofFigure28tovarious
DSP benchmarks............................................ 92
Table 7. Results for on-line framework tracking an applied goal.................... 95
Table 8. Results for on-line framework tracking an applied goal.................... 96
v
LIST OF FIGURES
Figure 1. An example of a dataflow graph.............................................................4
Figure 2. An example application graph................................................................5
Figure 3. Self-timed execution...............................................................................5
Figure 4. An example of an application graph and an associated self-timed sched-
ule. The numbers on the edges and denote non-
zero delays............................................................7
Figure 5. IPC graph constructed from application graph of Figure 4. Numbers
besides communication actors denote communi-
cation costs...........................................................8
Figure 6. Illustration for a system with no output buffer......................................14
Figure 7. Illustration for latency, throughput and output buffer...........................15
Figure 8. Pseudocode for transient-reduction post-processing............................27
Figure 9. Pseudocode for function generateSchedule..........................................28
Figure 10. Self-timed execution with first-iteration actors denoted by T...............29
Figure 11. Pseudocode to find minimum-transient schedule for a given ..............30
Figure 12. Pseudocode for latency-reduction post-processing...............................37
Figure 13. Illustration for retiming.........................................................................38
Figure 14. Pseudocode to solve the TBL problem.................................................39
Figure 16. plot forfft2.5.5 with ............................................................................41
Figure 15. Plots of vs. for FFT, QMF, Karp and Meas corresponding to Table 3....
............................................................................42
Figure 17. Overview...............................................................................................55
Figure 18. Graph for Example 7............................................................................66
vi