Table Of Content

Aris Gkoulalas-Divanis Abderrahim Labbi Editors Large-Scale Data Analytics Large-Scale Data Analytics Aris Gkoulalas-Divanis • Abderrahim Labbi Editors Large-Scale Data Analytics 123 Editors ArisGkoulalas-Divanis AbderrahimLabbi IBMResearch–Ireland IBMResearch–Zurich DamastownIndustrialEstate Rüschlikon,Switzerland Mulhuddart,Ireland ISBN978-1-4614-9241-2 ISBN978-1-4614-9242-9(eBook) DOI10.1007/978-1-4614-9242-9 SpringerNewYorkHeidelbergDordrechtLondon LibraryofCongressControlNumber:2013954541 ©SpringerScience+BusinessMediaNewYork2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface Inrecentyears,wearewitnessingadataexplosion:almost90%oftoday’sdatahave been produced only in the last 2 years, with data being nowadays produced in the orderofZettabytes!Thisdatacomesfromvarioussources,includingsensors,social networkingsites,mobilephoneapplications,electornicmedicalrecordsystemsand e-commerce sites, just to name a few. Apart from its massive volume, this data is alsocharacterizedbyvariety(heterogeneity)andvelocity(streamsofdata). Traditional approaches and algorithms are not able to process and analyze such massive and complex datasets. This has signified the need for a paradigm shift, where new hardware and software technology is emerging to efficiently and reliably manage, store, process, analyze and synthesize very large amounts of complex data generated by massively distributed data sources. Beside their massively distributed nature, which requires new distributed architectures for data analysis, the heterogeneity of such sources imposes significant challenges for the efficient analysis of the data under numerous constraints, such as consistent data integration, data homogenization and scaling, privacy and security preservation. Moreover, the emerging real-world applications in domains such as healthcare, weatherforecasting,financialengineering,urbanplanning,trafficmanagementand environmentalmonitoringimposeextrarequirementsforlarge-scaledataanalysis. This edited book contains contributions on cutting edge research related to large-scale data analytics in the following core areas: databases, data mining, supercomputing, data visualization and privacy. Our goal is to present to students, researchers,professionalsandpractitionersthestate-of-the-artresearch,whichwill help shape up the future of large-scale analytics, leading the way to the design of new approaches and technologies that can analyze and synthesize very large amountsofheterogeneousdata,generatedbymassivelydistributeddatasources. Each chapter of the book presents a survey of an area in large-scale data analytics, or individual results of the emerging research in the field. Chapters 1 and 2 are devoted to the MapReduce framework. In particular, the first chapter provides a comprehensive survey for a family of approaches and mechanisms of large scale data analysis that have been implemented based on the MapReduce framework. Chapter 2 focuses on optimization approaches for plain MapReduce v vi Preface jobs, as well as for parallel data flow systems. Chapters 3 and 4 present two importantapplicationareasoftheMapReduceframework:miningtera-scalegraphs forpatternsandanomalies(Chap.3),andanalyzingcustomerbehavioraldataforthe Telecomindustry(Chap.4).InChap.5,theauthorsdescribeaunifiedheterogeneous architecturethatintegratesmassivelythreadedshared-memorymultiprocessorsinto MapReduce-based clusters to enable executing Map and Reduce operators on thousandsofthreads,acrossmultipleGPUdevicesandnodes.Theproposedhybrid system can be used to accelerate machine learning algorithms, such as support vectormachines,achievingsignificantspeedup.Chapter6isdevotedtolarge-scale social network analysis, offering a comprehensive survey of the state-of-the-art in this area, with focus on parallel algorithms and libraries for the computation of network centrality metrics. An overview of data visualization methods that help userstogaininsightintolarge,heterogeneous,dynamictextualdatasetsisprovided in Chap.7. The last chapter of the book is devoted to technologies for offering security and privacy at large scale. The authors of this chapter present a novel frameworkforprivacy-preserving,distributeddataanalysisthatispracticalformany real-worldapplications. We,aseditors,aregenuinelygratefultoallcontributorsofthisbookforthetime andefforttheyputintothisproject,despitetheheavyburdenthatweputonthem. Wealsoowespecialthankstotheeffortoftheexternalreviewersfortheirhelpinthis effort. Last but not least, we are indepted to Susan Lagerstrom-Fife and Courtney ClarkfromSpringer,fortheirgreatsupporttowardsthepreparationandcompletion ofthiswork.Theireditingsuggestionswerevaluabletoimprovingtheorganization, readabilityandappearanceofthemanuscript. Mulhuddart,Ireland ArisGkoulalas-Divanis Rüschlikon,Switzerland AbderrahimLabbi Contents 1 TheFamilyofMap-Reduce................................................. 1 SherifSakrandAnnaLiu 2 OptimizationofMassivelyParallelDataFlows .......................... 41 FabianHueskeandVolkerMarkl 3 MiningTera-ScaleGraphswith“Pegasus”:Algorithms andDiscoveries............................................................... 75 UKangandChristosFaloutsos 4 CustomerAnalystfortheTelecomIndustry.............................. 101 DavidKonopnickiandMichalShmueli-Scheuer 5 MachineLearningAlgorithmAccelerationUsingHybrid (CPU-MPP)MapReduceClusters ......................................... 129 SergioHerrero-LopezandJohnR.Williams 6 Large-ScaleSocialNetworkAnalysis...................................... 155 MattiaLambertini,MatteoMagnani,MorenoMarzolla, DaniloMontesi,andCarminePaolino 7 VisualAnalysisandKnowledgeDiscoveryforText ..................... 189 ChristinSeifert,VedranSabol,WolfgangKienreich, ElisabethLex,andMichaelGranitzer 8 PracticalDistributedPrivacy-PreservingDataAnalysis atLargeScale ................................................................ 219 YitaoDuanandJohnCanny Index............................................................................... 253 vii Contributors JohnCanny ComputerScienceDivision,UniversityofCalifornia,Berkeley,CA, USA YitaoDuan NetEaseYoudao,Beijing,China Christos Faloutsos School of Computer Science, Carnegie Mellon University, Pittsburgh,PA,USA MichaelGranitzer UniversityofPassau,Passau,Germany Sergio Herrero-Lopez Technologies, Equities and Currency (TEC) Division, SwissQuantGroupAG,Zurich,Switzerland FabianHueske TechnischeUniversitätBerlin,Berlin,Germany UKang DepartmentofComputerScience,KAISTUniversity,RepublicofKorea WolfgangKienreich Know-CenterGraz,Graz,Austria DavidKonopnicki IBMHaifaResearchLab,Haifa,Israel MattiaLambertini DepartmentofComputerScienceandEngineering,University ofBologna,Bologna,Italy ElisabethLex Know-CenterGraz,Graz,Austria AnnaLiu NICTAandUniversityofNewSouthWales,Sydney,NSW,Australia MatteoMagnani DepartmentofInformationTechnology,UppsalaUniversity,751 05Uppsala,Sweden VolkerMarkl TechnischeUniversitätBerlin,Berlin,Germany MorenoMarzolla DepartmentofComputerScienceandEngineering,University ofBologna,Bologna,Italy DaniloMontesi DepartmentofComputerScienceandEngineering,Universityof Bologna,Bologna,Italy ix

Large-Scale Data Analytics PDF

276 Pages·2014·6.178 MB·English

by Sherif Sakr, Anna Liu (auth.), Aris Gkoulalas-Divanis, Abderrahim Labbi (eds.)

Checking for file health...

Save to my drive

Quick download

Download

Download Large-Scale Data Analytics PDF Free - Full Version

by Sherif Sakr, Anna Liu (auth.), Aris Gkoulalas-Divanis, Abderrahim Labbi (eds.)| 2014| 276 pages| 6.178| English

Download Large-Scale Data Analytics by Sherif Sakr, Anna Liu (auth.), Aris Gkoulalas-Divanis, Abderrahim Labbi (eds.) in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Large-Scale Data Analytics

No description available for this book.

Detailed Information

Author:	Sherif Sakr, Anna Liu (auth.), Aris Gkoulalas-Divanis, Abderrahim Labbi (eds.)
Publication Year:	2014
Pages:	276
Language:	English
File Size:	6.178
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Large-Scale Data Analytics Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Large-Scale Data Analytics PDF?

Yes, on https://PDFdrive.to you can download Large-Scale Data Analytics by Sherif Sakr, Anna Liu (auth.), Aris Gkoulalas-Divanis, Abderrahim Labbi (eds.) completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Large-Scale Data Analytics on my mobile device?

After downloading Large-Scale Data Analytics PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Large-Scale Data Analytics?

Yes, this is the complete PDF version of Large-Scale Data Analytics by Sherif Sakr, Anna Liu (auth.), Aris Gkoulalas-Divanis, Abderrahim Labbi (eds.). You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Large-Scale Data Analytics PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.