Table Of ContentBig Data
Computing
TThhiiss ppaaggee iinntteennttiioonnaallllyy lleefftt bbllaannkk
Big Data
Computing
Edited by
Rajendra Akerkar
Western Norway Research Institute
Sogndal, Norway
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2014 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20131028
International Standard Book Number-13: 978-1-4665-7838-8 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To
All the visionary minds who have helped create a modern data science profession
TThhiiss ppaaggee iinntteennttiioonnaallllyy lleefftt bbllaannkk
Contents
Preface ......................................................................................................................ix
Editor....................................................................................................................xvii
Contributors .........................................................................................................xix
Section I Introduction
1. Toward Evolving Knowledge Ecosystems
for Big Data Understanding .........................................................................3
Vadim Ermolayev, Rajendra Akerkar, Vagan Terziyan, and Michael Cochez
2. Tassonomy and Review of Big Data Solutions Navigation .................57
Pierfrancesco Bellini, Mariano di Claudio, Paolo Nesi, and Nadia Rauch
3. Big Data: Challenges and Opportunities ..............................................103
Roberto V. Zicari
Section II Semantic Technologies and Big Data
4. Management of Big Semantic Data ........................................................131
Javier D. Fernández, Mario Arias, Miguel A. Martínez-Prieto,
and Claudio Gutiérrez
5. Linked Data in Enterprise Integration ...................................................169
Sören Auer, Axel-Cyrille Ngonga Ngomo, Philipp Frischmuth,
and Jakub Klimek
6. Scalable End-User Access to Big Data ....................................................205
Martin Giese, Diego Calvanese, Peter Haase, Ian Horrocks,
Yannis Ioannidis, Herald Kllapi, Manolis Koubarakis, Maurizio Lenzerini,
Ralf Möller, Mariano Rodriguez Muro, Özgür Özçep, Riccardo Rosati,
Rudolf Schlatte, Michael Schmidt, Ahmet Soylu, and Arild Waaler
7. Semantic Data Interoperability: The Key Problem of Big Data .......245
Hele-Mai Haav and Peep Küngas
vii
viii Contents
Section III Big Data Processing
8. Big Data Exploration ..................................................................................273
Stratos Idreos
9. Big Data Processing with MapReduce ...................................................295
Jordà Polo
10. Efficient Processing of Stream Data over Persistent Data ..................315
M. Asif Naeem, Gillian Dobbie, and Gerald Weber
Section IV Big Data and Business
11. Economics of Big Data: A Value Perspective on State of the Art
and Future Trends ......................................................................................343
Tassilo Pellegrin
12. Advanced Data Analytics for Business ..................................................373
Rajendra Akerkar
Section V Big Data Applications
13. Big Social Data Analysis ...........................................................................401
Erik Cambria, Dheeraj Rajagopal, Daniel Olsher, and Dipankar Das
14. Real-Time Big Data Processing for Domain Experts: An
Application to Smart Buildings ...............................................................415
Dario Bonino, Fulvio Corno, and Luigi De Russis
15. Big Data Application: Analyzing Real-Time Electric Meter Data ....449
Mikhail Simonov, Giuseppe Caragnano, Lorenzo Mossucca, Pietro Ruiu,
and Olivier Terzo
16. Scaling of Geographic Space from the Perspective of City and
Field Blocks and Using Volunteered Geographic Information ........483
Bin Jiang and Xintao Liu
17. Big Textual Data Analytics and Knowledge Management ................501
Marcus Spies and Monika Jungemann-Dorner
Index .....................................................................................................................539
Preface
In the international marketplace, businesses, suppliers, and customers create
and consume vast amounts of information. Gartner* predicts that enterprise
data in all forms will grow up to 650% over the next five years. According
to IDC,† the world’s volume of data doubles every 18 months. Digital infor-
mation is doubling every 1.5 years and will exceed 1000 exabytes next year
according to the MIT Centre for Digital Research. In 2011, medical centers
held almost 1 billion terabytes of data. That is almost 2000 billion file cabinets’
worth of information. This deluge of data, often referred to as Big Data, obvi-
ously creates a challenge to the business community and data scientists.
The term Big Data refers to data sets the size of which is beyond the capa-
bilities of current database technology. It is an emerging field where innova-
tive technology offers alternatives in resolving the inherent problems that
appear when working with massive data, offering new ways to reuse and
extract value from information.
Businesses and government agencies aggregate data from numerous pri-
vate and/or public data sources. Private data is information that any orga-
nization exclusively stores that is available only to that organization, such
as employee data, customer data, and machine data (e.g., user transactions
and customer behavior). Public data is information that is available to the
public for a fee or at no charge, such as credit ratings, social media content
(e.g., LinkedIn, Facebook, and Twitter). Big Data has now reached every
sector in the world economy. It is transforming competitive opportunities
in every industry sector including banking, healthcare, insurance, manu-
facturing, retail, wholesale, transportation, communications, construction,
education, and utilities. It also plays key roles in trade operations such as
marketing, operations, supply chain, and new business models. It is becom-
ing rather evident that enterprises that fail to use their data efficiently are at a
large competitive disadvantage from those that can analyze and act on their
data. The possibilities of Big Data continue to evolve swiftly, driven by inno-
vation in the underlying technologies, platforms, and analytical capabilities
for handling data, as well as the evolution of behavior among its users as
increasingly humans live digital lives.
It is interesting to know that Big Data is different from the conventional
data models (e.g., relational databases and data models, or conventional gov-
ernance models). Thus, it is triggering organizations’ concern as they try to
separate information nuggets from the data heap. The conventional models
of structured, engineered data do not adequately reveal the realities of Big
* http://www.gartner.com/it/content/1258400/1258425/january_6_techtrends_rpaquet.pdf
† http://www.idc.com/
ix