Table Of ContentEXPERT’S VOICE IN OPEN SOURCE
Practical
MongoDB
Architecting, Developing,
and Administering MongoDB
—
Shakuntala Gupta Edward
Navin Sabharwal
www.it-ebooks.info
Practical MongoDB
Architecting, Developing, and
Administering MongoDB
Shakuntala Gupta Edward
Navin Sabharwal
www.it-ebooks.info
Practical MongoDB: Architecting, Developing, and Administering MongoDB
Shakuntala Gupta Edward Navin Sabharwal
Ghaziabad, Uttar Pradesh, India New Delhi, Delhi, India
ISBN-13 (pbk): 978-1-4842-0648-5 ISBN-13 (electronic): 978-1-4842-0647-8
DOI 10.1007/978-1-4842-0647-8
Library of Congress Control Number: 2015959699
Copyright © 2015 by Shakuntala Gupta Edward and Navin Sabharwal
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material
is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval,
electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly
analysis or material supplied specifically for the purpose of being entered and executed on a computer system,
for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only
under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use
must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright
Clearance Center. Violations are liable to prosecution under the respective Copyright Law.
Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every
occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion
and to the benefit of the trademark owner, with no intention of infringement of the trademark.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified
as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of publication, neither
the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may
be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
Managing Director: Welmoed Spahr
Acquisitions Editor: Celestin Suresh John
Developmental Editor: Douglas Pundick
Technical Reviewer: Gopala Manchukunda
Editorial Board: Steve Anglin, Mark Beckner, Ewan Buckingham, Gary Cornell, Louise Corrigan,
James DeWolf, Jonathan Gennick, Robert Hutchinson, Celestin Suresh John, Michelle Lowman,
James Markham, Susan McDermott, Matthew Moodie, Jeffrey Pepper, Douglas Pundick,
Ben Renow-Clarke, Gwenan Spearing, Matt Wade, Steve Weiss
Coordinating Editor: Rita Fernando
Copy Editor: Mary Behr
Compositor: SPi Global
Indexer: SPi Global
Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street,
6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected],
or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer
Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
For information on translations, please e-mail [email protected], or visit www.apress.com.
Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use.
eBook versions and licenses are also available for most titles. For more information, reference our Special Bulk
Sales–eBook Licensing web page at www.apress.com/bulk-sales.
Any source code or other supplementary materials referenced by the author in this text is available to readers at
www.apress.com. For detailed information about how to locate your book’s source code, go to www.apress.com/
source-code/.
Printed on acid-free paper
www.it-ebooks.info
Dedicated to people who made my life worth living and carved me into an individual
I am today and to God who shades every step of my life.
—Shakuntala Gupta Edward
Dedicated to the people I love and the God I trust.
—Navin Sabharwal
www.it-ebooks.info
Contents at a Glance
About the Authors ....................................................................................................xv
About the Technical Reviewers .............................................................................xvii
Acknowledgments ..................................................................................................xix
Preface ...................................................................................................................xxi
■ Chapter 1: Big Data................................................................................................1
■ Chapter 2: NoSQL .................................................................................................13
■ Chapter 3: Introducing MongoDB ........................................................................25
■ Chapter 4: The MongoDB Data Model ..................................................................29
■ Chapter 5: MongoDB - Installation and Confi guration .........................................35
■ Chapter 6: Using MongoDB Shell .........................................................................53
■ Chapter 7: MongoDB Architecture .......................................................................95
■ Chapter 8: MongoDB Explained .........................................................................159
■ Chapter 9: Administering MongoDB ..................................................................191
■ Chapter 10: MongoDB Use Cases .......................................................................213
■ Chapter 11: MongoDB Limitations .....................................................................227
■ Chapter 12: MongoDB Best Practices ................................................................233
Index .....................................................................................................................243
v
www.it-ebooks.info
Contents
About the Authors ....................................................................................................xv
About the Technical Reviewers .............................................................................xvii
Acknowledgments ..................................................................................................xix
Preface ...................................................................................................................xxi
■ Chapter 1: Big Data................................................................................................1
Getting Started .................................................................................................................1
Big Data ............................................................................................................................3
Facts About Big Data ..............................................................................................................................3
Big Data Sources ..............................................................................................................4
Three Vs of Big Data .........................................................................................................6
Volume ....................................................................................................................................................7
Variety .....................................................................................................................................................8
Velocity ...................................................................................................................................................8
Usage of Big Data .............................................................................................................9
Visibility ..................................................................................................................................................9
Discover and Analyze Information ..........................................................................................................9
Segmentation and Customizations .........................................................................................................9
Aiding Decision Making ..........................................................................................................................9
Innovation ...............................................................................................................................................9
Big Data Challenges .......................................................................................................10
Policies and Procedures .......................................................................................................................10
Access to Data ......................................................................................................................................10
Technology and Techniques ..................................................................................................................10
vii
www.it-ebooks.info
■ CONTENTS
Legacy Systems and Big Data ........................................................................................10
Structure of Big Data ............................................................................................................................10
Data Storage .........................................................................................................................................11
Data Processing ....................................................................................................................................11
Big Data Technologies ....................................................................................................11
Summary ........................................................................................................................12
■ Chapter 2: NoSQL .................................................................................................13
SQL .................................................................................................................................13
NoSQL .............................................................................................................................13
Defi nition ..............................................................................................................................................14
A Brief History of NoSQL .......................................................................................................................15
ACID vs. BASE .................................................................................................................15
CAP Theorem (Brewer’s Theorem) ........................................................................................................15
The BASE ..............................................................................................................................................16
NoSQL Advantages and Disadvantages ..........................................................................17
Advantages of NoSQL ...........................................................................................................................17
Disadvantages of NoSQL ......................................................................................................................18
SQL vs. NoSQL Databases ..............................................................................................18
Categories of NoSQL Databases .....................................................................................22
Summary ........................................................................................................................23
■ Chapter 3: Introducing MongoDB ........................................................................25
History ............................................................................................................................25
MongoDB Design Philosophy..........................................................................................26
Speed, Scalability, and Agility ...............................................................................................................26
Non-Relational Approach ......................................................................................................................26
JSON-Based Document Store ...............................................................................................................26
Performance vs. Features ....................................................................................................................27
Running the Database Anywhere..........................................................................................................27
SQL Comparison .............................................................................................................27
Summary ........................................................................................................................28
viii
www.it-ebooks.info
■ CONTENTS
■ Chapter 4: The MongoDB Data Model ..................................................................29
The Data Model ..............................................................................................................29
JSON and BSON ....................................................................................................................................31
The Identifi er (_id) ................................................................................................................................32
Capped Collection .................................................................................................................................32
Polymorphic Schemas ....................................................................................................32
Object-Oriented Programming ..............................................................................................................32
Schema Evolution .................................................................................................................................33
Summary ........................................................................................................................34
■ Chapter 5: MongoDB - Installation and Confi guration .........................................35
Select Your Version .........................................................................................................35
Installing MongoDB on Linux ..........................................................................................36
Installing Using Repositories ................................................................................................................36
Installing Manually ...............................................................................................................................36
Installing MongoDB on Windows ....................................................................................37
Running MongoDB ..........................................................................................................37
Preconditions ........................................................................................................................................37
Starting the Service ..............................................................................................................................38
Verifying the Installation .................................................................................................38
MongoDB Shell ...............................................................................................................38
Securing the Deployment ...............................................................................................39
Using Authentication and Authorization ................................................................................................39
Controlling Access to a Network ...........................................................................................................44
Provisioning Using MongoDB Cloud Manager ................................................................47
Summary ........................................................................................................................52
■ Chapter 6: Using MongoDB Shell .........................................................................53
Basic Querying ...............................................................................................................53
Create and Insert ..................................................................................................................................58
Explicitly Creating Collections ..............................................................................................................60
Inserting Documents Using Loop ..........................................................................................................60
ix
www.it-ebooks.info
■ CONTENTS
Inserting by Explicitly Specifying _id ....................................................................................................60
Update ..................................................................................................................................................61
Delete ...................................................................................................................................................62
Read .....................................................................................................................................................63
Using Indexes .......................................................................................................................................69
Stepping Beyond the Basics ...........................................................................................78
Using Conditional Operators .................................................................................................................79
Regular Expressions .............................................................................................................................81
MapReduce ...........................................................................................................................................82
aggregate( ) ...........................................................................................................................................83
Designing an Application’s Data Model ..........................................................................84
Relational Data Modeling and Normalization .......................................................................................84
MongoDB Document Data Model Approach ..........................................................................................86
Summary ........................................................................................................................93
■ Chapter 7: MongoDB Architecture .......................................................................95
Core Processes ...............................................................................................................95
mongod .................................................................................................................................................95
mongo ...................................................................................................................................................96
mongos .................................................................................................................................................96
MongoDB Tools ...............................................................................................................96
Standalone Deployment .................................................................................................96
Replication .....................................................................................................................97
Master/Slave Replication ......................................................................................................................97
Replica Set ...........................................................................................................................................98
Implementing Advanced Clustering with Replica Sets .......................................................................115
Sharding .......................................................................................................................124
Sharding Components ........................................................................................................................125
Data Distribution Process ...................................................................................................................127
Data Balancing Process ......................................................................................................................130
x
www.it-ebooks.info
■ CONTENTS
Operations ..........................................................................................................................................133
Implementing Sharding ......................................................................................................................134
Controlling Collection Distribution (Tag-Based Sharding) ...................................................................142
Points to Remember When Importing Data in a ShardedEnvironment ...............................................151
Monitoring for Sharding ......................................................................................................................152
Monitoring the Confi g Servers ............................................................................................................152
Production Cluster Architecture ...................................................................................152
Scenario 1 ..........................................................................................................................................153
Scenario 2 ..........................................................................................................................................154
Scenario 3 ..........................................................................................................................................155
Scenario 4 ..........................................................................................................................................156
Summary ......................................................................................................................157
■ Chapter 8: MongoDB Explained .........................................................................159
Data Storage Engine .....................................................................................................159
Data File (Relevant for MMAPv1) ..................................................................................161
Namespace (.ns File) ..........................................................................................................................162
Data File (Relevant for WiredTiger) ...............................................................................170
Reads and Writes .........................................................................................................172
How Data Is Written Using Journaling ..........................................................................174
GridFS – The MongoDB File System .............................................................................178
The Rationale of GridFS ......................................................................................................................178
GridFSunder the Hood ........................................................................................................................179
Using GridFS .......................................................................................................................................180
Indexing ........................................................................................................................183
Types of Indexes .................................................................................................................................184
Behaviors and Limitations ..................................................................................................................190
Summary ......................................................................................................................190
xi
www.it-ebooks.info
Description:Practical Guide to MongoDB: Architecting, Developing, and Administering MongoDB begins with a short introduction to the basics of NoSQL databases and then introduces readers to MongoDB—the leading document based NoSQL database, acquainting them step-by-step with all aspects of MongoDB. Practical G