Table Of ContentYing Xu · Juan Cui
David Puett
Cancer
Bioinformatics
Cancer Bioinformatics
Ying Xu (cid:129) Juan Cui (cid:129) David Puett
Cancer Bioinformatics
Ying Xu Juan Cui
Department of Biochemistry Department of Computer Science
and Molecular Biology and Engineering
University of Georgia University of Nebraska
Athens , GA , USA Lincoln , NE , USA
David Puett
Department of Biochemistry
and Molecular Biology
University of Georgia
Athens , GA , USA
ISBN 978-1-4939-1380-0 ISBN 978-1-4939-1381-7 (eBook)
DOI 10.1007/978-1-4939-1381-7
Springer New York Heidelberg Dordrecht London
Library of Congress Control Number: 2014945124
© Springer Science+Business Media New York 2014
T his work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifi cally the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfi lms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifi cally for the purpose of being entered and
executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this
publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s
location, in its current version, and permission for use must always be obtained from Springer.
Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations
are liable to prosecution under the respective Copyright Law.
T he use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specifi c statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Pref ace
In his superb exposition, T he Emperor of All Maladies: A Biography of Cancer ,
Mukherjee attributes the earliest documentation of cancer to the brilliant Egyptian,
Imhotep, who some 4,500 years ago clearly described a case of breast cancer
(Mukherjee 2010). Roughly two millennia later (ca. 400 BC), the Greek physician
Hippocrates named the disease k arkinos (the Greek word for crab), which has now
come down to us as cancer. Some fi ve to six centuries later while practicing in Rome
(ca. 130–200 AD), the Greek physician, Claudius Galen, who was infl uenced by the
four humors constituting the human body as proposed by the Hippocratic school,
i.e., blood, phlegm, yellow bile, and black bile, attributed cancer to an excess of
black bile. It took centuries before Vesalius (sixteenth century) and Baillie (eigh-
teenth century) put the black bile hypothesis to rest, thus indirectly encouraging
surgeons to begin resection of solid tumors. (Surgical procedures had been done
earlier by some fearless surgeons, but few patients survived the ordeal and infection
that likely followed.) The later introduction of anesthesia and antibiotics in the nine-
teenth to twentieth centuries, as well as more sterile operating environments, thrust
surgery (and later radiation therapy) as a major treatment of this disease, an approach
that is still used whenever possible. In the middle of the twentieth century and con-
tinuing today, chemotherapy and hormonal therapy emerged as a complement to,
and sometimes instead of, surgery and radiation therapy to treat cancer.
A number of theories have been proposed regarding those factors that may drive
and facilitate a cancer to initiate, develop, and metastasize, and these have guided
cancer studies in the past few decades. An insightful speculation was made by Otto
Warburg following his seminal work in the 1920s: “Cancer … has countless second-
ary causes. But … there is only one prime cause, [which] is the replacement of res-
piration of oxygen in normal body cells by a fermentation of sugar” (Warburg 1969).
The fi rst discovery of oncogenes and tumor suppressor genes about 40 years ago
marked another major milestone in our understanding of cancer development,
which has profoundly infl uenced research in this area during the past three decades.
It has become a widely held belief that cancer is ultimately a disease caused by
genomic mutations. Aided by the rapidly increasing pool of a variety of o mic data
such as genomic, transcriptomic, epigenomic, metabolomic, glycomic, lipidomic,
v
vi Preface
and pharmacogenomic data collected on both cell lines and cancer tissues, spectacu-
lar progress has been made in the past two decades in our understanding of cancer,
particularly in terms of how the microenvironment and the immune system contrib-
ute to the whole process of neoplasm formation and survival.
In spite of the considerable progress made, however, a number of salient ques-
tions remain to be answered. The authors posit that a considerable amount of infor-
mation needed to address and answer many of these questions already exists in the
available omic databases, and much of these data are substantially undermined and
underutilized. Among the many possible reasons, a key one, we believe, is that com-
putational cancer biologists, as a community, have yet to suffi ciently develop their
independent thinking about the overall biology of cancer. The thinking should be
quite different from the reductionist approaches that have been widely used in
experimental studies of cancer in the past century and should enable them to address
fundamental questions about cancer in a holistic manner as an evolving system.
Many fundamental issues concerning cancer are intrinsically holistic by nature.
Thus, when examining cancer as an evolutionary problem, its microenvironment,
including the extracellular matrix and the immune and other stromal cells, must be
considered as an integral part of the system. This strongly suggests that cell culture-
based or animal model-based cancer studies must be complemented by cancer
tissue- based studies in order to gain a full understanding of cancer. The omic data
collected on cancer tissue samples, covering different developmental stages, is
likely to contain the information on the interplay between cancer cells and their
environment, and particularly how such interactions may drive the evolution in spe-
cifi c directions. Hence, we posit that mining such o mic data for information discov-
ery will, in the future, represent an essential component of cancer research,
complementary to the current more reductionist-oriented approaches.
T he goals of this book are to provide an overview of cancer biology from an
informatics perspective and to demonstrate how o mic data can be mined to generate
new insights and a more comprehensive understanding that is needed to address a
wide range of fundamental cancer biology questions. Throughout this book, the
authors have attempted to establish the following key points: (1) cancer is a process
of cell survival in an increasingly more stressful and diffi cult microenvironment,
which co-evolves with the diseased cells; (2) cell proliferation is a cancer’s way to
reduce the stresses imposed on them for survival; (3) the challenges that the evolv-
ing cells must overcome are not only at the cell level, but more importantly at the
tissue level, hence making cancer dominantly a tissue rather than a cell-only prob-
lem; (4) the survival pathway for each cancer is not created ‘on the fl y’ through its
selection of molecular malfunctions or genetic mutations, instead it is largely deter-
mined by substantial cellular programs encoded in the human genome, which origi-
nally evolved for other purposes; (5) subpopulations of cancer cells have managed
to create the conditions needed to trigger such cellular program-guided survival
pathways; (6) as the stresses become increasingly more challenging, cancer cells
utilize increasingly less reversible stress-responses for their survival, thus making
the disease progressively more malignant; (7) genomic mutations in sporadic
c ancers probably serve mainly as permanent replacements for ongoing functions to
Preface vii
provide effi ciency and sustainability for survival; in contrast, mutations in heredi-
tary cancers dominantly play driver roles of cancer initiation, but in a sense different
from driver mutations as defi ned in the current literature; (8) there is a fundamental
difference between cell proliferation in primary v ersus metastatic cancers as the
former is essential in overcoming the encountered stress(es) while the latter is sim-
ply a side product of a stress-response process, suggesting that their treatment regi-
ments should be different; and (9) cancer survives and proliferates by continually
evolving with natural selection having a major part in deciding which cells remain
and which must perish.
For each chapter, the authors present the main topic by placing cancer in an evo-
lutionary context, for example by raising and addressing questions such as: W hat
pressures are the evolving neoplastic cells currently under , and H ow have the cells
responded to adapt to the pressures ? In addition, the authors also demonstrate
through examples how to derive the desired information from the available omic
data by asking questions and then addressing them using a hypothesis-driven data
mining approach. An example could be as follows: What is the difference between
the main driving forces of primary versus metastatic cancer ? This can be addressed
by identifying genes that are up-regulated consistently across all metastatic cancers
versus their matching primary cancer tissues, and then delineating the particular
pathways that are enriched by these genes.
T his 14-chapter book consists of the following clusters of chapters. Chapters 1
and 2 introduce the basic biology and biochemistry of cancer and the available
cancer o mic data, as well as the type of information derivable from such data.
Chapter 3 serves as an introduction to the use of omic data to address cancer-
related problems, written for someone with only a limited knowledge of cancer;
and Chap. 1 2 serves a similar purpose but for someone who has a general under-
standing about cancer at the molecular and cellular levels, e.g., having read a sub-
stantial portion of this book. Chapter 4 is a transition chapter, serving as an
introduction to both information that can be derived from cancer genomes and
elucidation of cancer mechanisms using such information. Chapters 5 through 9
represent the core of the book: elucidation of novel information and how to gain a
new and better understanding about the fundamental biology of primary cancer, in
which cancer is treated as an evolving system driven by specifi c pressures and
assisted by certain facilitators at different developmental stages. A common theme
is used when tackling a series of cancer-related key issues across these fi ve chap-
ters: What stresses do the cancer cells need to overcome at a specifi c stage , and
how do such cells utilize encoded stress-response systems to ensure their survival ?
Chapters 10 and 11 extend this discussion to metastatic cancer, which, somewhat
surprisingly, represents a different type of disease from primary cancers with fun-
damentally different drivers. Chapter 13 provides some general information to
those new to the fi eld about how to conduct meaningful data mining-based cancer
research. Chapter 1 4 presents our perspectives about cancer research using a more
holistic approach than is generally done.
The authors hope that this book will help in bridging the gap between experimental
cancer biologists and computational biologists in their joint efforts to uncover the
viii Preface
enormous wealth of information hidden in the cancer o mic data. Success in this
endeavor will lead to a better understanding of cancer, as well as assist computa-
tional biologists to develop independent thinking when tackling these complex
problems. This approach will probably be less detail-oriented but more holistic and
will likely span the entire range of cancer evolution, thus making it different from
but complementary to those of their experimental peers. It is the authors’ contention
that more qualitative and quantitative utilization of the o mic data will improve our
overall understanding of cancer biology, hence leading to improved capabilities in
early detection, development of more effective cancer treatments, and improvement
in the quality of the patient lives.
The authors welcome any feedback from the reader regarding errors that need
correcting and areas where the book could be improved. Such information will be
highly valuable, particularly if there is a decision to write a future edition of the book.
Athens, GA, USA Ying Xu
Lincoln, NE, USA Juan Cui
Chapel Hill, NC, USA David Puett
References
Mukherjee, S. (2010). The emperor of all maladies: a biography of cancer , Scribner.
Warburg, O. H. (1969). The prime cause and prevention of cancer , K. Triltsch.
Acknowledgments
T he authors thank their many trainees, postdoctoral fellows, collaborators, and
colleagues who have contributed enormously to this book. We wish to particularly
acknowledge the following for their invaluable assistance and contributions.
Mr. Chi Zhang and Ms. Sha Cao, two brilliant and dedicated doctoral students at
the University of Georgia, have provided tremendous assistance in generating all
of the case studies along with the fi gures showing the results of data analysis, in
proofreading most chapters, and in coordinating the efforts in reference collection
and fi gure generation. Dr. Ying Li and Dr. Wei Du, two young faculty members at
the Jilin University College of Computer Science and Technology, have spent long
hours and tireless efforts in collecting the majority of the references used through-
out this book. Mr. Liang Chen and Ms. Yanjiao Ren, two Ph.D. students, also at the
Jilin University College of Computer Science and Technology, have carried out the
artistic design and the generation of all the non-boxplot fi gures used in this book.
We particularly want to thank Mr. Liang Chen for his superb illustrations and fi g-
ures, including the book cover image, that have clearly enhanced the presentation
of the contents covered. Mr. Xin Chen, a visiting graduate student at the University
of Georgia, has also contributed to improving the presentation of some of the fi g-
ures. In addition, Ms. Sha Cao and Dr. Wei Du have organized journal clubs to
present various chapters of the book at the University of Georgia and Jilin
University, respectively. Feedback from these two journal clubs has been very use-
ful in guiding us in the revision of the earlier drafts. A number of colleagues have
read the early drafts of some chapters and provided invaluable comments and sug-
gestions for further revision, including Professor Shaying Zhao of the University
of Georgia, Professor Dong Xu of the University of Missouri, and Professor Yuan
Yuan of the Chinese Medical University in Shenyang. We thank them profusely for
their valuable comments and suggestions that clearly improved the quality of the
book. Suffi ce it to say that any errors in the book are solely the responsibility of
the authors.
W e also wish to thank all the cancer biologists and funding agencies that have sup-
ported the publicly available cancer o mic data, thus making our research as well as this
book possible. It is a pleasure to thank Mr. Gilbert Miller for his fi nancial contribution
ix