Table Of ContentTable of Contents Page: vii
Foreword Page: xv
Preface Page: xix
General Information Page: xx
HBase Version Page: xx
Building the Examples Page: xxi
Hush: The HBase URL Shortener Page: xxiii
Running Hush Page: xxv
Conventions Used in This Book Page: xxv
Using Code Examples Page: xxvi
Safari® Books Online Page: xxvi
How to Contact Us Page: xxvii
Acknowledgments Page: xxvii
Chapter 1. Introduction Page: 1
The Dawn of Big Data Page: 1
The Problem with Relational Database Systems Page: 5
Nonrelational Database Systems, Not-Only SQL or NoSQL? Page: 8
Dimensions Page: 10
Scalability Page: 12
Database (De-)Normalization Page: 13
Building Blocks Page: 16
Backdrop Page: 16
Tables, Rows, Columns, and Cells Page: 17
Auto-Sharding Page: 21
Storage API Page: 22
Implementation Page: 23
Summary Page: 27
HBase: The Hadoop Database Page: 27
History Page: 27
Nomenclature Page: 29
Summary Page: 29
Chapter 2. Installation Page: 31
Quick-Start Guide Page: 31
Requirements Page: 34
Hardware Page: 34
Servers Page: 35
Networking Page: 39
Software Page: 40
Operating system Page: 40
Filesystem Page: 43
Java Page: 46
Hadoop Page: 46
SSH Page: 48
Domain Name Service Page: 48
Synchronized time Page: 49
File handles and process limits Page: 49
Datanode handlers Page: 51
Swappiness Page: 51
Windows Page: 52
Filesystems for HBase Page: 52
Local Page: 54
HDFS Page: 54
S3 Page: 54
Other Filesystems Page: 55
Installation Choices Page: 55
Apache Binary Release Page: 55
Building from Source Page: 58
Run Modes Page: 58
Standalone Mode Page: 59
Distributed Mode Page: 59
Pseudodistributed mode Page: 59
Fully distributed mode Page: 60
Specifying region servers Page: 60
ZooKeeper setup Page: 60
Using the existing ZooKeeper ensemble Page: 62
Configuration Page: 63
hbase-site.xml and hbase-default.xml Page: 64
hbase-env.sh Page: 65
regionserver Page: 65
log4j.properties Page: 65
Example Configuration Page: 65
hbase-site.xml Page: 66
regionservers Page: 66
hbase-env.sh Page: 66
Client Configuration Page: 67
Deployment Page: 68
Script-Based Page: 68
Apache Whirr Page: 69
Puppet and Chef Page: 70
Operating a Cluster Page: 71
Running and Confirming Your Installation Page: 71
Web-based UI Introduction Page: 71
Shell Introduction Page: 73
Stopping the Cluster Page: 73
Chapter 3. Client API: The Basics Page: 75
General Notes Page: 75
CRUD Operations Page: 76
Put Method Page: 76
Single Puts Page: 77
The KeyValue class Page: 83
Client-side write buffer Page: 86
List of Puts Page: 90
Atomic compare-and-set Page: 93
Get Method Page: 95
Single Gets Page: 95
The Result class Page: 98
List of Gets Page: 100
Related retrieval methods Page: 103
Delete Method Page: 105
Single Deletes Page: 105
List of Deletes Page: 108
Atomic compare-and-delete Page: 112
Batch Operations Page: 114
Row Locks Page: 118
Scans Page: 122
Introduction Page: 122
The ResultScanner Class Page: 124
Caching Versus Batching Page: 127
Miscellaneous Features Page: 133
The HTable Utility Methods Page: 133
The Bytes Class Page: 134
Chapter 4. Client API: Advanced Features Page: 137
Filters Page: 137
Introduction to Filters Page: 137
The filter hierarchy Page: 138
Comparison operators Page: 139
Comparators Page: 139
Comparison Filters Page: 140
RowFilter Page: 141
FamilyFilter Page: 142
QualifierFilter Page: 144
ValueFilter Page: 144
DependentColumnFilter Page: 145
Dedicated Filters Page: 147
SingleColumnValueFilter Page: 147
SingleColumnValueExcludeFilter Page: 148
PrefixFilter Page: 149
PageFilter Page: 149
KeyOnlyFilter Page: 151
FirstKeyOnlyFilter Page: 151
InclusiveStopFilter Page: 151
TimestampsFilter Page: 152
ColumnCountGetFilter Page: 154
ColumnPaginationFilter Page: 154
ColumnPrefixFilter Page: 155
RandomRowFilter Page: 155
Decorating Filters Page: 155
SkipFilter Page: 155
WhileMatchFilter Page: 157
FilterList Page: 159
Custom Filters Page: 160
Filters Summary Page: 167
Counters Page: 168
Introduction to Counters Page: 168
Single Counters Page: 171
Multiple Counters Page: 172
Coprocessors Page: 175
Introduction to Coprocessors Page: 175
The Coprocessor Class Page: 176
Coprocessor Loading Page: 179
Loading from the configuration Page: 180
Loading from the table descriptor Page: 181
The RegionObserver Class Page: 182
Handling region life-cycle events Page: 183
State: pending open Page: 183
Handling client API events Page: 184
State: open Page: 184
State: pending close Page: 184
The RegionCoprocessorEnvironment class Page: 185
The ObserverContext class Page: 186
The BaseRegionObserver class Page: 187
The MasterObserver Class Page: 190
The MasterCoprocessorEnvironment class Page: 191
The BaseMasterObserver class Page: 192
Endpoints Page: 193
The CoprocessorProtocol interface Page: 194
The BaseEndpointCoprocessor class Page: 195
HTablePool Page: 199
Connection Handling Page: 203
Chapter 5. Client API: Administrative Features Page: 207
Schema Definition Page: 207
Tables Page: 207
Table Properties Page: 210
Column Families Page: 212
HBaseAdmin Page: 218
Basic Operations Page: 219
Table Operations Page: 220
Schema Operations Page: 228
Cluster Operations Page: 230
Cluster Status Information Page: 233
Chapter 6. Available Clients Page: 241
Introduction to REST, Thrift, and Avro Page: 241
Interactive Clients Page: 244
Native Java Page: 244
REST Page: 244
Operation Page: 244
Supported formats Page: 246
Plain (text/plain) Page: 246
XML (text/xml) Page: 247
JSON (application/json) Page: 248
Protocol Buffer (application/x-protobuf) Page: 249
Raw binary (application/octet-stream) Page: 249
REST Java client Page: 250
Thrift Page: 251
Installation Page: 251
Operation Page: 252
Example: PHP Page: 253
Avro Page: 255
Installation Page: 255
Operation Page: 255
Other Clients Page: 256
Batch Clients Page: 257
MapReduce Page: 257
Native Java Page: 257
Clojure Page: 258
Hive Page: 258
Pig Page: 263
Cascading Page: 267
Shell Page: 268
Basics Page: 269
Commands Page: 271
General Page: 272
Data definition Page: 273
Data manipulation Page: 273
Tools Page: 274
Replication Page: 274
Scripting Page: 274
Web-based UI Page: 277
Master UI Page: 277
Main page Page: 277
User Table page Page: 279
ZooKeeper page Page: 282
Region Server UI Page: 283
Main page Page: 283
Shared Pages Page: 283
Chapter 7. MapReduce Integration Page: 289
Framework Page: 289
MapReduce Introduction Page: 289
Classes Page: 290
InputFormat Page: 290
Mapper Page: 291
Reducer Page: 292
OutputFormat Page: 292
Supporting Classes Page: 293
MapReduce Locality Page: 293
Table Splits Page: 294
MapReduce over HBase Page: 295
Preparation Page: 295
Static Provisioning Page: 296
Dynamic Provisioning Page: 296
Data Sink Page: 301
Data Source Page: 306
Data Source and Sink Page: 308
Custom Processing Page: 311
Chapter 8. Architecture Page: 315
Seek Versus Transfer Page: 315
B+ Trees Page: 315
Log-Structured Merge-Trees Page: 316
Storage Page: 319
Overview Page: 319
Write Path Page: 320
Files Page: 321
Root-level files Page: 323
Table-level files Page: 324
Region-level files Page: 324
Region splits Page: 326
Compactions Page: 328
HFile Format Page: 329
KeyValue Format Page: 332
Write-Ahead Log Page: 333
Overview Page: 333
HLog Class Page: 335
HLogKey Class Page: 336
WALEdit Class Page: 336
LogSyncer Class Page: 337
LogRoller Class Page: 338
Replay Page: 338
Single log Page: 339
Log splitting Page: 339
Edits recovery Page: 341
Durability Page: 341
Read Path Page: 342
Region Lookups Page: 345
The Region Life Cycle Page: 348
ZooKeeper Page: 348
Replication Page: 351
Life of a Log Edit Page: 352
Normal processing Page: 352
Non-Responding slave clusters Page: 353
Internals Page: 353
Choosing region servers to replicate to Page: 353
Keeping track of logs Page: 353
Reading, filtering, and sending edits Page: 354
Cleaning logs Page: 354
Region server failover Page: 355
Chapter 9. Advanced Usage Page: 357
Key Design Page: 357
Concepts Page: 357
Tall-Narrow Versus Flat-Wide Tables Page: 359
Partial Key Scans Page: 360
Pagination Page: 362
Time Series Data Page: 363
Time-Ordered Relations Page: 367
Advanced Schemas Page: 369
Secondary Indexes Page: 370
Search Integration Page: 373
Transactions Page: 376
Bloom Filters Page: 377
Versioning Page: 381
Implicit Versioning Page: 381
Custom Versioning Page: 384
Chapter 10. Cluster Monitoring Page: 387
Introduction Page: 387
The Metrics Framework Page: 388
Contexts, Records, and Metrics Page: 389
Master Metrics Page: 394
Region Server Metrics Page: 394
RPC Metrics Page: 396
JVM Metrics Page: 397
Info Metrics Page: 399
Ganglia Page: 400
Installation Page: 401
Ganglia-related steps Page: 401
Ganglia monitoring daemon Page: 401
Ganglia meta daemon Page: 403
HBase-related steps Page: 404
Ganglia web frontend Page: 404
Usage Page: 405
JMX Page: 408
JConsole Page: 410
JMX Remote API Page: 413
Nagios Page: 417
Chapter 11. Performance Tuning Page: 419
Garbage Collection Tuning Page: 419
Memstore-Local Allocation Buffer Page: 422
Compression Page: 424
Available Codecs Page: 424
Snappy Page: 425
LZO Page: 425
GZIP Page: 425
Verifying Installation Page: 426
Compression test tool Page: 426
Startup check Page: 427
Enabling Compression Page: 427
Optimizing Splits and Compactions Page: 429
Managed Splitting Page: 429
Region Hotspotting Page: 430
Presplitting Regions Page: 430
Load Balancing Page: 432
Merging Regions Page: 433
Client API: Best Practices Page: 434
Configuration Page: 436
Load Tests Page: 439
Performance Evaluation Page: 439
YCSB Page: 440
Chapter 12. Cluster Administration Page: 445
Operational Tasks Page: 445
Node Decommissioning Page: 445
Rolling Restarts Page: 447
Adding Servers Page: 447
Pseudodistributed mode Page: 448
Adding a local backup master Page: 448
Adding a local region server Page: 449
Fully distributed cluster Page: 450
Adding a backup master Page: 450
Data Tasks Page: 452
Import and Export Tools Page: 452
CopyTable Tool Page: 457
Bulk Import Page: 459
Bulk load procedure Page: 459
Using the importtsv tool Page: 460
Using the completebulkload Tool Page: 461
Advanced usage Page: 461
Replication Page: 462
Additional Tasks Page: 464
Coexisting Clusters Page: 464
Required Ports Page: 466
Changing Logging Levels Page: 466
Troubleshooting Page: 467
HBase Fsck Page: 467
Analyzing the Logs Page: 468
Common Issues Page: 471
Basic setup checklist Page: 471
File handles Page: 471
DataNode connections Page: 471
Compression Page: 471
Stability issues Page: 472
Garbage collection/memory tuning Page: 472
ZooKeeper problems Page: 472
“Could not obtain block” errors Page: 473
Appendix A. HBase Configuration Properties Page: 475
Appendix B. Road Map Page: 489
HBase 0.92.0 Page: 489
HBase 0.94.0 Page: 490
Appendix C. Upgrade from Previous Releases Page: 491
Upgrading to HBase 0.90.x Page: 491
From 0.20.x or 0.89.x Page: 491
Within 0.90.x Page: 492
Upgrading to HBase 0.92.0 Page: 492
Appendix D. Distributions Page: 493
Cloudera’s Distribution Including Apache Hadoop Page: 493
Appendix E. Hush SQL Schema Page: 495
Appendix F. HBase Versus Bigtable Page: 497
Index Page: 501
Description:If you're looking for a scalable storage solution to accommodate a virtually endless amount of data, this book shows you how Apache HBase can fulfill your needs. As the open source implementation of Google's BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. Many IT executives are asking pointed questions about HBase. This book provides meaningful answers, whether you’re evaluating this non-relational database or planning to put it into practice right away. Discover how tight integration with Hadoop makes scalability with HBase easier Distribute large datasets across an inexpensive cluster of commodity servers Access HBase with native Java clients, or with gateway servers providing REST, Avro, or Thrift APIs Get details on HBase’s architecture, including the storage format, write-ahead log, background processes, and more Integrate HBase with Hadoop's MapReduce framework for massively parallelized data processing jobs Learn how to tune clusters, design schemas, copy tables, import bulk data, decommission nodes, and many other tasks