Table Of ContentNETWORKING
for BIG DATA
© 2016 by Taylor & Francis Group, LLC
Chapman & Hall/CRC
Big Data Series
SERIES EDITOR
Sanjay Ranka
AIMS AND SCOPE
This series aims to present new research and applications in Big Data, along with the computa-
tional tools and techniques currently in development. The inclusion of concrete examples and
applications is highly encouraged. The scope of the series includes, but is not limited to, titles in the
areas of social networks, sensor networks, data-centric computing, astronomy, genomics, medical
data analytics, large-scale e-commerce, and other relevant topics that may be proposed by poten-
tial contributors.
PUBLISHED TITLES
BIG DATA : ALGORITHMS, ANALYTICS, AND APPLICATIONS
Kuan-Ching Li, Hai Jiang, Laurence T. Yang, and Alfredo Cuzzocrea
NETWORKING FOR BIG DATA
Shui Yu, Xiaodong Lin, Jelena Mišic´, and Xuemin (Sherman) Shen
© 2016 by Taylor & Francis Group, LLC
Chapman & Hall/CRC
Big Data Series
NETWORKING
for BIG DATA
Edited by
Shui Yu
Deakin University
Burwood, Australia
Xiaodong Lin
University of Ontario Institute of Technology
Oshawa, Ontario, Canada
Jelena Mišic´
Ryerson University
Toronto, Ontario, Canada
Xuemin (Sherman) Shen
University of Waterloo
Waterloo, Ontario, Canada
© 2016 by Taylor & Francis Group, LLC
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2016 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20150610
International Standard Book Number-13: 978-1-4822-6350-3 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid-
ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti-
lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy-
ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
© 2016 by Taylor & Francis Group, LLC
Contents
Preface, ix
Editors, xv
Contributors, xix
Section i I ntroduction of Big Data
chapter 1 ◾ Orchestrating Science DMZs for Big Data Acceleration:
Challenges and Approaches 3
SaptarShi Debroy, praSaD calyam, anD matthew DickinSon
chapter 2 ◾ A Survey of Virtual Machine Placement in Cloud
Computing for Big Data 27
yang wang, Jie wu, ShaoJie tang, anD wu Zhang
chapter 3 ◾ Big Data Management Challenges, Approaches, Tools,
and Their Limitations 43
michel aDiba, Juan carloS caStreJón, Javier a. eSpinoSa-ovieDo,
genoveva vargaS-Solar, anD JoSé-luiS Zechinelli-martini
chapter 4 ◾ Big Data Distributed Systems Management 57
raShiD a. SaeeD anD elmuStafa SayeD ali
Section ii Networking Theory and Design for Big Data
chapter 5 ◾ Moving Big Data to the Cloud: Online Cost-Minimizing
Algorithms 75
linquan Zhang, chuan wu, Zongpeng li, chuanxiong guo, minghua chen,
anD franciS c. m. lau
chapter 6 ◾ Data Process and Analysis Technologies of Big Data 103
peter wloDarcZak, muStafa ally, anD Jeffrey Soar
v
© 2016 by Taylor & Francis Group, LLC
vi ◾ Contents
chapter 7 ◾ Network Configuration and Flow Scheduling for Big Data
Applications 121
lautaro Dolberg, Jérôme françoiS, Shihabur rahman chowDhury,
reaZ ahmeD, raouf boutaba, anD thomaS engel
chapter 8 ◾ Speedup of Big Data Transfer on the Internet 139
guangyan huang, wanlei Zhou, anD Jing he
chapter 9 ◾ Energy-Aware Survivable Routing in Ever-Escalating Data
Environments 157
bing luo, william liu, anD aDnan al-anbuky
Section iii Networking Security for Big Data
chapter 10 ◾ A Review of Network Intrusion Detection in the Big
Data Era: Challenges and Future Trends 195
weiZhi meng anD wenJuan li
chapter 11 ◾ Toward MapReduce-Based Machine-Learning
Techniques for Processing Massive Network
Threat Monitoring 215
linqiang ge, hanling Zhang, guobin xu, wei yu, chen chen, anD erik blaSch
chapter 12 ◾ Anonymous Communication for Big Data 233
lichun li anD rongxing lu
chapter 13 ◾ Flow-Based Anomaly Detection in Big Data 257
Zahra JaDiDi, vallipuram muthukkumaraSamy, elankayer SithiraSenan,
anD kalvinDer Singh
Section iv Platforms and Systems for Big Data Applications
chapter 14 ◾ Mining Social Media with SDN-Enabled Big Data Platform
to Transform TV Watching Experience 283
han hu, yonggang wen, tat-Seng chua, anD xuelong li
chapter 15 ◾ Trends in Cloud Infrastructures for Big Data 305
yacine DJemaiel, boutheina a. feSSi, anD noureDDine bouDriga
© 2016 by Taylor & Francis Group, LLC
Contents ◾ vii
chapter 16 ◾ A User Data Profile-Aware Policy-Based Network
Management Framework in the Era of Big Data 323
faDi alhaDDaDin, william liu, anD Jairo a. gutiérreZ
chapter 17 ◾ C ircuit Emulation for Big Data Transfers in Clouds 359
marat Zhanikeev
INDEx, 393
© 2016 by Taylor & Francis Group, LLC
© 2016 by Taylor & Francis Group, LLC
Preface
We have witnessed the dramatic increase of the use of information technology
in every aspect of our lives. For example, Canada’s healthcare providers have been
moving to electronic record systems that store patients’ personal health information in
digital format. These provide healthcare professionals an easy, reliable, and safe way to
share and access patients’ health information, thereby providing a reliable and cost-effec-
tive way to improve efficiency and quality of healthcare. However, e-health applications,
together with many others that serve our society, lead to the explosive growth of data.
Therefore, the crucial question is how to turn the vast amount of data into insight, helping
us to better understand what’s really happening in our society. In other words, we have
come to a point where we need to quickly identify the trends of societal changes through
the analysis of the huge amounts of data generated in our daily lives so that proper recom-
mendations can be made in order to react quickly before tragedy occurs. This brand new
challenge is named Big Data.
Big Data is emerging as a very active research topic due to its pervasive applications
in human society, such as governing, climate, finance, science, and so on. In 2012, the
Obama administration announced the Big Data Research and Development Initiative,
which aims to explore the potential of how Big Data could be used to address important
problems facing the government. Although many research studies have been carried out
over the past several years, most of them fall under data mining, machine learning, and
data analysis. However, these amazing top-level killer applications would not be possible
without the underlying support of network infrastructure due to their extremely large vol-
ume and computing complexity, especially when real-time or near-real-time applications
are demanded.
To date, Big Data is still quite mysterious to various research communities, and par-
ticularly, the networking perspective for Big Data to the best of our knowledge is seldom
tackled. Many problems wait to be solved, including optimal network topology for Big
Data, parallel structures and algorithms for Big Data computing, information retrieval in
Big Data, network security, and privacy issues in Big Data.
This book aims to fill the lacunae in Big Data research, and focuses on important net-
working issues in Big Data. Specifically, this book is divided into four major sections:
Introduction to Big Data, Networking Theory and Design for Big Data, Networking
Security for Big Data, and Platforms and Systems for Big Data Applications.
ix
© 2016 by Taylor & Francis Group, LLC
x ◾ Preface
Section I gives a comprehensive introduction to Big Data and its networking issues. It
consists of four chapters.
Chapter 1 deals with the challenges in networking for science Big Data movement across
campuses, the limitations of legacy campus infrastructure, the technological and policy
transformation requirements in building science DMZ infrastructures within campuses
through two exemplar case studies, and open problems to personalize such science DMZ
infrastructures for accelerated Big Data movement.
Chapter 2 introduces some representative literature addressing the Virtual Machine
Placement Problem (VMPP) in the hope of providing a clear and comprehensive vision on
different objectives and corresponding algorithms concerning this subject. VMPP is one
of the key technologies for cloud-based Big Data analytics and recently has drawn much
attention. It deals with the problem of assigning virtual machines to servers in order to
achieve desired objectives, such as minimizing costs and maximizing performance.
Chapter 3 investigates the main challenges involved in the three Vs of Big Data—volume,
velocity, and variety. It reviews the main characteristics of existing solutions for address-
ing each of the Vs (e.g., NoSQL, parallel RDBMS, stream data management systems, and
complex event processing systems). Finally, it provides a classification of different func-
tions offered by NewSQL systems and discusses their benefits and limitations for process-
ing Big Data.
Chapter 4 deals with the concept of Big Data systems management, especially distrib-
uted systems management, and describes the huge problems of storing, processing, and
managing Big Data that are faced by the current data systems. It then explains the types
of current data management systems and what will accrue to these systems in cases of Big
Data. It also describes the types of modern systems, such as Hadoop technology, that can
be used to manage Big Data systems.
Section II covers networking theory and design for Big Data. It consists of five chapters.
Chapter 5 deals with an important open issue of efficiently moving Big Data, produced
at different geographical locations over time, into a cloud for processing in an online man-
ner. Two representative scenarios are examined and online algorithms are introduced to
achieve the timely, cost-minimizing upload of Big Data into the cloud. The first scenario
focuses on uploading dynamically generated, geodispersed data into a cloud for processing
using a centralized MapReduce-like framework. The second scenario involves uploading
deferral Big Data for processing by a (possibly distributed) MapReduce framework.
Chapter 6 describes some of the most widespread technologies used for Big Data.
Emerging technologies for the parallel, distributed processing of Big Data are introduced
in this chapter. At the storage level, distributed filesystems for the effective storage of large
data volumes on hardware media are described. NoSQL databases, widely in use for persist-
ing, manipulating, and retrieving Big Data, are explained. At the processing level, frame-
works for massive, parallel processing capable of handling the volumes and complexities
of Big Data are explicated. Analytic techniques extract useful patterns from Big Data and
turn data into knowledge. At the analytic layer, the chapter describes the techniques for
understanding the data, finding useful patterns, and making predictions on future data.
Finally, the chapter gives some future directions where Big Data technologies will develop.
© 2016 by Taylor & Francis Group, LLC