Table Of ContentF. T H O M S ON L E I G H T ON
I N T R O D U C T I ON TO
PARALLEL ALGORITHMS
AND ARCHITECTURES:
ARRAYS · TREES · HYPERCUBES
MORGAN KAUFMANN PUBLISHERS
SAN MATEO, CALIFORNIA
Sponsoring Editor: Bruce M. Spatz
Production Editor: Yonie Overton
Cover Designer: Victoria Ann Philp
Copyeditor: Bob Klingensmith
Morgan Kaufmann Publishers, Inc.
Editorial Office:
2929 Campus Drive, Suite 260
San Mateo, CA 94403
© 1992 by Morgan Kaufmann Publishers, Inc.
All rights reserved
Printed in the United States of America
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means—electronic, mechanical, photocopying, recording, or otherwise—without
the prior written permission of the publisher.
94 93 92 91 543 2 1
Library of Congress Cataloging-in-Publication Data is available for this book.
Preface
This book is designed to serve as an introduction to the exciting and rapidly
expanding field of parallel algorithms and architectures. The text is specif-
ically directed towards parallel computation involving the most popular
network architectures: arrays, trees, hypercubes, and some closely related
networks.
The text covers the structure and relationships between the dominant
network architectures, as well as the fastest and most efficient parallel algo-
rithms for a wide variety of problems. Throughout, emphasis is placed on
fundamental results and techniques and on rigorous analysis of algorithmic
performance. Most of the material covered in the text is directly applica-
ble to many of the parallel machines that are now commercially available.
Those portions of the text that are of primarily theoretical interest are
identified as such and can be passed without interrupting the flow of the
text.
The book is targeted for a reader with a general technical background,
although some previous familiarity with algorithms or programming will
prove to be helpful when reading the text. No previous familiarity with
parallel algorithms or networks is expected or assumed.
Most of the text is written at a level that is suitable for undergrad-
uates. Sections that involve more complicated material are denoted by a
• following the section heading. A few highly advanced subsections in the
text are denoted with a • • following the subsection heading. These sub-
sections cover material that is meant for advanced researchers, although
the introductions to these subsections are written so as to be accessible to
all.
Readers who wish to understand the more advanced sections of the
text, but who find that they lack the necessary mathematical or computer
χ Preface
science background, are referred to the text by Cormen, Leiserson, and
Rivest [51] for an introduction to algorithms, the text by Graham, Knuth,
and Patashnik [84] for an introduction to concrete mathematics (including
combinatorics, probability, counting arguments, and asymptotic analysis),
and the text by Maurer and Ralston [167] for an elementary introduction
to both subjects.
Organization of the Material
The book is organized into three chapters according to network architec-
ture. We begin with the simplest architectures (arrays and trees) in Chap-
ter 1 and advance to more complicated architectures in Chapter 2 (meshes
of trees) and Chapter 3 (hypercubes and related networks). Each chap-
ter can be read independently; however, Section 1.1 and Subsection 1.2.2
provide important background material for all three chapters.
Within each chapter, the material is organized according to application
domain. Throughout, we start with simple algorithms for simple problems
and advance to more complicated algorithms for more complicated prob-
lems within each chapter and each section.
Commonality between algorithms for the same problem on different
networks and different problems on the same network is pointed out and
emphasized where appropriate. Particular emphasis is placed on the most
basic paradigms and primitives for parallel algorithm design. These para-
digms and primitives (which include prefix computation, divide and con-
quer, pointer jumping, Fourier transform, matrix multiplication, packet
routing, and sorting) arise in all three chapters and provide threads that
link the chapters together.
Of course, there are many other ways that one could organize the same
material. We have chosen this particular organization for several reasons.
First, algorithms designed for different problems on the same network tend
to have more in common with each other than do algorithms designed for
the same problem on different networks. For example, Chapter 1 contains
optimal algorithms for Gaussian elimination and finding minimum-weight
spanning trees on an array. These algorithms have surprisingly similar
structures. However, the minimum-weight spanning tree algorithm de-
scribed in Chapter 1 is quite different from the minimum-weight spanning
tree algorithm described in Chapter 2. This is because the optimal al-
gorithm for finding a minimum-weight spanning tree on an array is quite
different from the optimal algorithms for this problem on other networks.
Teaching from the Text xi
As a consequence, an organization of the material by network architecture
allows for more cohesion than an organization by application domain.
Second, an organization by network architecture facilitates use by read-
ers who are interested in only one particular architecture. For example, if
you are programming one of the many array-based parallel machines, then
you will want to focus your reading on Chapter 1.
Finally, it is easiest to learn the basic techniques of parallel algorithm
design by studying them as they naturally arise in various problem do-
mains. Although the idea of organizing the material around basic tech-
niques may seem appealing at first, such an organization suffers from a
serious lack of cohesion caused by the fact that the basic paradigms and
primitives arise in widely varying contexts. For example, a chapter on pre-
fix computations would naturally include topics such as carry-lookahead
addition, solution of tridiagonal systems of equations, indexing, data dis-
tribution, and certain circuit-switching algorithms, but it would likely not
include other algorithms for these same problems. As a consequence, many
significant educational opportunities would be lost by such an organization.
For the most part, the sections in each chapter are independent of
each other, and the table of contents and index have been designed to
accommodate readers who want to follow a different path through the
book. If you are interested in specific problems (such as graph algorithms
or linear algebra), then you can use the text by reading only those sections
within each chapter. If you are interested only in the implementations
and applications of certain basic techniques (such as prefix computation
or matrix multiplication), then you can read the text selectively with the
help of the table of contents and the index.
Teaching from the Text
This book is also designed to be used as a text for an introductory (late
undergraduate or early graduate) course on parallel algorithms and archi-
tectures. Drafts of this material have been successfully used in numer-
ous course settings during the past several years. Typically, a course on
this subject will cover a large portion of the introductory material (i.e.,
the non-starred sections) from all three chapters. For example, a one-
semester course could consist of the material from Sections 1.1-1.5 (possi-
bly excluding Subsections 1.3.3-1.3.5), a sampling of the non-starred ma-
terial from Sections 1.6-1.8, Subsection 1.9.5 (and possibly 1.9.1 as well),
Section 2.1, a sampling from Sections 2.2, 2.4, and 2.5 (possibly exclud-
xii Preface
ing Subsection 2.5.5), Sections 3.1-3.3 (excluding Subsections 3.1.4, 3.2.3,
and 3.3.4), Section 3.4 (possibly excluding Subsections 3.4.6-3.4.8), and
Subsections 3.5.1, 3.6.1, and 3.6.2. Material from Section 3.7 might also
be included as time permits.
The book can also be used in courses devoted to specific architectures
such as arrays or hypercube-related networks. An array-based course could
include Chapter 1 in its entirety. For a course on hypercube-related ar-
chitectures, it would be helpful to cover the material in Section 1.1 and
Subsection 1.2.2 before proceeding to Chapter 3. Since all of the algo-
rithms described in Chapters 1 and 2 can be implemented directly on a
hypercube, it might also make sense to include most of the material from
Sections 2.1, 2.2, 2.4, and 2.5 (excluding 2.5.5) in such a course. In ad-
dition, the material in Subsection 1.9.5 provides a worthwhile perspective
for results in Chapter 3; Theorem 1.21 in Subsection 1.9.1 is used for prov-
ing lower bounds on the bisection width of the networks in Chapter 3;
Theorem 1.16 in Subsection 1.7.5 is used in the proof of Theorem 3.12 in
Subsection 3.2.2; and Corollary 1.19 from Subsection 1.7.5 is used to show
that the hypercubic networks are universal in Subsections 3.2.2 and 3.3.3.
Finally, the text can be used as a supplement for courses on related sub-
jects such as VLSI, graph theory, computer architecture, and algorithms.
Lecture notes and problem sets for the courses on this material that are
taught at MIT can be purchased from the MIT Laboratory for Computer
Science by sending a request for MIT/LCS/RSS10 (which is the most recent
version of the notes available at the time of this printing) to
Publications Office
Laboratory for Computer Science
545 Technology Square
Cambridge, MA 02139.
Examples of the curricula based on this text that are used at other
universities will be made available by Morgan Kaufmann Publishers.
Exercises and Bibliographic Notes
Particular emphasis has been placed on the selection and formulation of
the more than 750 exercises that appear in the problem sections located
near the end of each chapter. Many of these exercises have been tested in
a wide variety of settings and have been solved by students with widely
varying backgrounds and abilities.
Errors xiii
The problems are divided into several categories. Problems without an
asterisk are the easiest and should be solvable by the average reader within
5-50 minutes after reading the appropriate section of the text. Problems
with a single asterisk (*) are harder and will take more advanced readers
10-100 minutes to solve, on average. Problems with two asterisks (**)
are very challenging and can require several days of effort from the best
students. Many of the harder problems introduce new material that is the
subject of current research.
Problems marked with an R are research problems. Some of these prob-
lems are probably easy and some could be very hard. (Some might even
have been solved already without my being aware of the fact.) Problems
marked with an R* are more likely to be very challenging since they have
been studied by several researchers. (Some of the problems marked with
an R have not been studied by anyone, as far as I know.)
Unfortunately, 750+ problems can be overwhelming for the instructor
who wants to select a few for homework or for the reader seeking content
reinforcement. Hence, I have emphasized the 250 or so most worthwhile
problems by printing the problem numbers in boldface. As a consequence,
there will be about one boldface problem for every three pages of reading.
All citations of results described in the text and all pointers to outside
references are contained in the bibliographic notes at the end of each chap-
ter. These notes are meant to be helpful but not exhaustive. The citations
are included at the end of each chapter so that the reader can concentrate
on understanding the technical material without getting bogged down in
the sometimes messy business of assigning credit, and so that the reader
can quickly locate pointers to references without having to wade through
the technical material.
Errors
Despite the best efforts of many people, it is likely that the text contains
numerous errors. If you find any, then please let me know. I can be reached
by electronic mail at [email protected] .edu or by sending hardcopy mail
to MIT. A list of known errors will be compiled and made available by
Morgan Kaufmann Publishers. These errors will be corrected in subsequent
printings of the book.
xiv Preface
Preview of Volume II
Readers who find this book useful may be interested to know that a related
text is currently being developed. The second text will be titled Introduc-
tion to Parallel Algorithms and Architectures: Expanders · PR A Ms · VLSI
(referred to as Volume II herein) and will be coauthored by Bruce Maggs.
We are currently projecting that Volume II will consist of five chapters
numbered four through eight. The contents of these chapters are briefly
described in what follows.
Chapter 4 will describe the expander family of networks, including
the multibutterfly, multi-Benes network, and the AKS sorting circuit. Al-
though expander-based networks are not currently used in the design of
parallel machines, recent work suggests that some of these networks may
become important components in future high-performance architectures.
Chapter 5 is devoted to abstract models of parallelism such as the par-
allel random access machine (PRAM). The PRAM model unburdens the
parallel algorithm designer from having to worry about wiring and memory
organization issues, thereby allowing him or her to focus on abstract paral-
lelism. We will describe a wide variety of PRAM algorithms in Chapter 5,
and the chapter will be organized so that theoretically inclined readers can
start there instead of in Chapter 1. We will then continue in Chapter 6
with a discussion of lower bound techniques and P-Completeness.
In Chapter 7, we will return to more practical matters and discuss is-
sues relating to the fabrication of large-scale parallel machines. Particular
attention will be devoted to very large scale integration (VLSI) compu-
tation and design. Among other things, we will see in Chapter 7 why
hypercubes are more costly to build than arrays and why area-universal
networks such as the mesh of trees are particularly cost-effective.
We will conclude in Chapter 8 with a collection of important topics.
Included will be a survey of state-of-the-art parallel computers, an intro-
duction to parallel programming (with examples from the Connection Ma-
chine), a discussion of issues relating to fault tolerance, and a discussion of
bus-based architectures.
We have already begun writing Volume II and we hope to have it
completed and available from Morgan Kaufmann within the next few years.
Much of the material in Volume II is covered in the lecture notes for
the courses taught at MIT (e.g., MIT/LCS/RSS10) that were mentioned
earlier. In addition, some of this material can also be found in the follow-
ing sources: the paper by Arora, Leighton, and Maggs [15] (for informa-
Acknowledgments xv
tion on expanders, multibutterflies, and nonblocking networks), the papers
by Ajtai, Komlos, and Szemerédi [5] and Paterson [194] (for information
on the AKS sorting circuit), the survey paper by Karp and Ramachan-
dran [113] and the text by Gibbons and Rytter [81] (for information on
PRAM algorithms), the texts by Mead and Conway [168], Lengauer [155],
Ullman [247], and Glasser and Dobberpuhl [82] (for more information on
VLSI computation and design), and the text by Almasi and Gottlieb [7]
for more information on parallel programming and state-of-the-art parallel
machines.
Acknowledgments
Many people have contributed substantially to the creation of this text.
On the technical side, I am most indebted to Bruce Maggs and Charles
Leiserson. Bruce spent countless hours reading drafts of the text and
is directly responsible for improving the quality of the manuscript. In
addition to catching some nasty bugs and suggesting simpler explanations
for several results, Bruce also helped provide motivation for completing
the text by commencing work on Volume II. Charles also contributed
substantially to the text, although in different ways. Charles and I have
been co-teaching courses on parallel algorithms and architectures for nearly
10 years, and I have learned a great deal from him during this time. Many
of the explanations and exercises presented in the text are due to Charles
or were improved as a result of his influence.
Of course, many other people provided technical assistance with this
work. I am particularly thankful to Al Borodin, Robert Fowler, Richard
Karp, Arnold Rosenberg, Clark Thomborson, Les Valiant and Vijay Vazi-
rani for reviewing early drafts of the text and to Richard Anderson,
Mikhail Atallah, and Franco Preparata for their thorough reviews of
later drafts. In addition, I would like to thank
Bill Aiello Bobby Blumofe Lenore Cowen
Mike Ernst Jose Fernandez Nabil Kahale
Mike Klugerman Manfred Kunde Yuan Ma
Greg Plaxton Eric Schwabe Nick Trefethen
Jacob White David Williamson
for reading sections of the text and for providing numerous helpful com-
ments.
Special recognition also goes to
xvi Preface
Jon Buss Ron Greenberg Mark Hansen
Nabil Kahale Joe Kilian Mike Klugerman
Dina Kravets Bruce Maggs Marios Papaefthymiou
Serge Plotkin Eric Schwabe Peter Shor
Joel Wein
for their help as teaching assistants for this material during the past decade.
I would also like to thank the following people for numerous helpful
discussions, suggestions, and pointers:
Anant Agarwal Alok Aggarwal Sanjeev Arora
Arvind Paul Beame Bonnie Berger
Sandeep Bhatt Gianfranco Bilardi Fan Chung
Richard Cole Bob Cypher Bill Dally
Persi Diaconis Shimon Even Greg Frederickson
Ron Graham David Greenberg Torben Hagerup
Susanne Hambrusch Johan Hâstad Dan Kleitman
Tom Knight Richard Koch Rao Kosaraju
Danny Krizanc Clyde Kruskal H. T. Kung
Thomas Lengauer Fillia Makedon Gary Miller
Mark Newman Victor Pan Michael Rabin
Abhiram Ranade Satish Rao John Reif
Sartaj Sahni Jorge Sanz Chuck Seitz
Adi Shamir Alan Siegel Burton Smith
Marc Snir Larry Snyder Quentin Stout
Hal Sudborough Bob Tarjan Thanasis Tsantilas
Eli Upfal Uzi Vishkin David Wilson
On the production side, I am most indebted to Martha Adams, Jose
Fernandez, David Jones, and Tim Wright. Martha converted the text
from handwritten scribbles to Ί^Χ and then from Ί^Κ to I&TjjX. This was
a difficult and (at times) frustrating task that spanned many years. Jose
converted my crude sketches into the clear and artistic figures that appear
in the text. Jose's unusual ability to express complicated technical material
in easy-to-understand figures has substantially enhanced the quality of the
text. David entered tens of thousands of revisions into the text during
countless late nights at LCS, and he performed the formatting for the
final text. Text setting and figure placement in a text such as this is a
tricky, time-consuming, and frustrating business, and I am very grateful
to David for doing such a splendid job. Tim helped with many aspects of
the preparation of the final manuscript, including revisions, hunting down