Table Of ContentApache Kudu User Guide
ImportantNotice
©2010-2017Cloudera,Inc.Allrightsreserved.
Cloudera,theClouderalogo,andanyotherproductorservicenamesorsloganscontained
inthisdocumentaretrademarksofClouderaanditssuppliersorlicensors,andmaynot
becopied,imitatedorused,inwholeorinpart,withoutthepriorwrittenpermission
ofClouderaortheapplicabletrademarkholder.
HadoopandtheHadoopelephantlogoaretrademarksoftheApacheSoftware
Foundation.Allothertrademarks,registeredtrademarks,productnamesandcompany
namesorlogosmentionedinthisdocumentarethepropertyoftheirrespectiveowners.
Referencetoanyproducts,services,processesorotherinformation,bytradename,
trademark,manufacturer,supplierorotherwisedoesnotconstituteorimply
endorsement,sponsorshiporrecommendationthereofbyus.
Complyingwithallapplicablecopyrightlawsistheresponsibilityoftheuser.Without
limitingtherightsundercopyright,nopartofthisdocumentmaybereproduced,stored
inorintroducedintoaretrievalsystem,ortransmittedinanyformorbyanymeans
(electronic,mechanical,photocopying,recording,orotherwise),orforanypurpose,
withouttheexpresswrittenpermissionofCloudera.
Clouderamayhavepatents,patentapplications,trademarks,copyrights,orother
intellectualpropertyrightscoveringsubjectmatterinthisdocument.Exceptasexpressly
providedinanywrittenlicenseagreementfromCloudera,thefurnishingofthisdocument
doesnotgiveyouanylicensetothesepatents,trademarkscopyrights,orother
intellectualproperty.ForinformationaboutpatentscoveringClouderaproducts,see
http://tiny.cloudera.com/patents.
Theinformationinthisdocumentissubjecttochangewithoutnotice.Clouderashall
notbeliableforanydamagesresultingfromtechnicalerrorsoromissionswhichmay
bepresentinthisdocument,orfromuseofthisdocument.
Cloudera,Inc.
1001PageMillRoad,Bldg3
PaloAlto,CA94304
[email protected]
US:1-888-789-1488
Intl:1-650-362-0488
www.cloudera.com
ReleaseInformation
Version:ApacheKudu1.2.0/CDH5.10.x
Date:December7,2017
Table of Contents
About Apache Kudu.................................................................................................8
Concepts and Terms.............................................................................................................................................9
ColumnarDatastore...............................................................................................................................................................9
RaftConsensusAlgorithm......................................................................................................................................................9
Table.......................................................................................................................................................................................9
Tablet.....................................................................................................................................................................................9
Tablet Server..........................................................................................................................................................................9
Master....................................................................................................................................................................................9
Catalog Table.......................................................................................................................................................................10
Logical Replication...............................................................................................................................................................10
ArchitecturalOverview.......................................................................................................................................10
ExampleUseCases.............................................................................................................................................11
Next Steps..........................................................................................................................................................11
Apache Kudu Release Notes...................................................................................12
SchemaDesignandUsageLimitations...............................................................................................................12
Kudu1.2.0/CDH5.10.2ReleaseNotes.............................................................................................................12
Kudu1.2.0/CDH5.10.1ReleaseNotes.............................................................................................................12
Kudu1.2.0/CDH5.10.0ReleaseNotes.............................................................................................................12
NewFeaturesandImprovementsinKudu1.2.0/CDH5.10.0.............................................................................................12
IssuesFixedinKudu1.2.0/CDH5.10.0...............................................................................................................................13
IncompatibleChangesinKudu1.2.0/CDH5.10.0..............................................................................................................14
KnownIssuesandLimitationsinKudu1.2.0/CDH5.10.0...................................................................................................14
Kudu1.1.xReleaseNotes...................................................................................................................................14
NewFeaturesinKudu1.1.0.................................................................................................................................................14
IssuesFixedinKudu1.1.0....................................................................................................................................................16
Kudu1.0.1ReleaseNotes...................................................................................................................................16
IssuesFixedinKudu1.0.1....................................................................................................................................................16
Kudu1.0.0ReleaseNotes...................................................................................................................................16
NewFeaturesinKudu1.0.0.................................................................................................................................................17
IncompatibleChangesinKudu1.0.0....................................................................................................................................18
KnownIssuesandLimitationsofKudu1.0.0........................................................................................................................18
IssuesFixedinKudu1.0.0....................................................................................................................................................20
Kudu0.10.0ReleaseNotes.................................................................................................................................20
NewFeaturesinKudu0.10.0...............................................................................................................................................20
OtherImprovementsinKudu0.10.0....................................................................................................................................21
IssuesFixedinKudu0.10.0..................................................................................................................................................21
IncompatibleChangesinKudu0.10.0..................................................................................................................................22
Kudu0.9.1ReleaseNotes...................................................................................................................................22
IssuesFixedinKudu0.9.1....................................................................................................................................................22
Kudu0.9.0ReleaseNotes...................................................................................................................................22
NewFeaturesinKudu0.9.0.................................................................................................................................................23
OtherImprovementsandChangesinKudu0.9.0................................................................................................................23
IssuesFixedinKudu0.9.0....................................................................................................................................................23
IncompatibleChangesinKudu0.9.0....................................................................................................................................23
LimitationsofKudu0.9.0.....................................................................................................................................................23
UpgradeNotesforKudu0.9.0.............................................................................................................................................24
Kudu0.8.0ReleaseNotes...................................................................................................................................24
NewFeaturesinKudu0.8.0.................................................................................................................................................24
OtherImprovementsinKudu0.8.0......................................................................................................................................24
IssuesFixedinKudu0.8.0....................................................................................................................................................24
IncompatibleChangesinKudu0.8.0....................................................................................................................................24
LimitationsofKudu0.8.0.....................................................................................................................................................24
UpgradeNotesforKudu0.8.0.............................................................................................................................................24
Kudu0.7.1ReleaseNotes...................................................................................................................................25
IssuesFixedinKudu0.7.1....................................................................................................................................................25
LimitationsofKudu0.7.1.....................................................................................................................................................25
UpgradeNotesForKudu0.7.1.............................................................................................................................................25
Kudu0.7.0ReleaseNotes...................................................................................................................................25
NewFeaturesinKudu0.7.0.................................................................................................................................................25
OtherImprovementsinKudu0.7.0......................................................................................................................................25
IssuesFixedinKudu0.7.0....................................................................................................................................................26
IncompatibleChangesinKudu0.7.0....................................................................................................................................26
LimitationsofKudu0.7.0.....................................................................................................................................................26
UpgradeNotesForKudu0.7.0.............................................................................................................................................28
Kudu0.6ReleaseNotes......................................................................................................................................28
NewFeaturesinKudu0.6....................................................................................................................................................28
IssuesFixedinKudu0.6.......................................................................................................................................................28
LimitationsofKudu0.6........................................................................................................................................................28
UpgradeNotesForKudu0.6................................................................................................................................................30
Kudu0.5ReleaseNotes......................................................................................................................................30
LimitationsofKudu0.5........................................................................................................................................................30
Next Steps..........................................................................................................................................................32
ApacheKuduSchemaDesignandUsageLimitations.........................................................................................32
SchemaDesignLimitations..................................................................................................................................................32
Partitioning Limitations.......................................................................................................................................................33
ReplicationandBackupLimitations.....................................................................................................................................33
Impala Integration Limitations............................................................................................................................................33
Spark Integration Limitations..............................................................................................................................................33
SecurityLimitations..............................................................................................................................................................34
Other Known Issues.............................................................................................................................................................34
Installing and Upgrading Apache Kudu...................................................................35
Kudu Installation Requirements.........................................................................................................................35
InstallKuduUsingClouderaManager................................................................................................................35
InstallKuduUsingParcels....................................................................................................................................................36
InstallKuduUsingPackages.................................................................................................................................................37
InstallKuduUsingtheCommandLine................................................................................................................38
UpgradingKudu..................................................................................................................................................39
UpgradingKuduUsingParcels.............................................................................................................................................39
UpgradeKuduUsingPackages............................................................................................................................................39
Next Steps..........................................................................................................................................................40
Apache Kudu Configuration....................................................................................41
ConfiguringtheKuduMaster.............................................................................................................................41
Configuring Tablet Servers.................................................................................................................................42
Apache Kudu Administration..................................................................................43
StartingandStoppingKuduProcesses...............................................................................................................43
Kudu Web Interfaces..........................................................................................................................................43
KuduMasterWebInterface.................................................................................................................................................43
KuduTabletServerWebInterface........................................................................................................................................43
CommonWebInterfacePages.............................................................................................................................................43
Kudu Metrics......................................................................................................................................................44
Listing available metrics......................................................................................................................................................44
CollectingmetricsviaHTTP..................................................................................................................................................44
Collectingmetricstoalog...................................................................................................................................................45
Common Kudu workflows..................................................................................................................................45
MigratingtoMultipleKuduMasters...................................................................................................................................45
RecoveringfromadeadKuduMasterinaMulti-MasterDeployment................................................................................49
Developing Applications With Apache Kudu...........................................................52
Viewing the API Documentation........................................................................................................................52
Kudu Example Applications................................................................................................................................52
Maven Artifacts..................................................................................................................................................53
KuduPythonClient.............................................................................................................................................53
ExampleApacheImpalaCommandsWithKudu................................................................................................54
KuduIntegrationwithSpark...............................................................................................................................54
IntegrationwithMapReduce,YARN,andOtherFrameworks............................................................................55
Using Apache Impala (incubating) with Kudu.........................................................56
Impala Database Containment Model...............................................................................................................56
InternalandExternalImpalaTables...................................................................................................................56
UsingImpalaToQueryKuduTables...................................................................................................................57
QueryinganExistingKuduTablefromImpala.....................................................................................................................57
CreatingaNewKuduTableFromImpala.............................................................................................................................58
Partitioning Tables...............................................................................................................................................................58
OptimizingPerformanceforEvaluatingSQLPredicates......................................................................................................62
InsertingaRow....................................................................................................................................................................62
Updating a Row...................................................................................................................................................................63
UpsertingaRow...................................................................................................................................................................63
DeletingaRow.....................................................................................................................................................................63
FailuresDuringINSERT,UPDATE,UPSERT,andDELETEOperations.....................................................................................64
AlteringTableProperties......................................................................................................................................................64
DroppingaKuduTableusingImpala...................................................................................................................................65
Known Issues and Limitations............................................................................................................................65
Next Steps..........................................................................................................................................................65
Apache Kudu Schema Design.................................................................................66
ThePerfectSchema............................................................................................................................................66
Column Design...................................................................................................................................................66
Column Encoding.................................................................................................................................................................67
Column Compression...........................................................................................................................................................67
Primary Key Design............................................................................................................................................67
Primary Key Index................................................................................................................................................................68
Partitioning.........................................................................................................................................................68
Range Partitioning...............................................................................................................................................................68
Hash Partitioning.................................................................................................................................................................69
Multilevel Partitioning.........................................................................................................................................................69
PartitionPruning..................................................................................................................................................................69
Partitioning Examples..........................................................................................................................................................69
Schema Alterations............................................................................................................................................72
Schema Design Limitations................................................................................................................................72
Apache Kudu Transaction Semantics......................................................................73
SingleTabletWriteOperations...........................................................................................................................73
WritingtoMultipleTablets.................................................................................................................................73
Read Operations (Scans)....................................................................................................................................74
Known Issues and Limitations............................................................................................................................75
Reads (Scans).......................................................................................................................................................................75
Writes...................................................................................................................................................................................75
Troubleshooting Apache Kudu................................................................................77
IssuesStartingorRestartingtheMasterorTabletServer..................................................................................77
Errorduringholepunchtest................................................................................................................................................77
Clock is not synchronized...................................................................................................................................77
deploy.pyscriptexitswiththetoofewargumentserror...................................................................................78
Troubleshooting Performance Issues.................................................................................................................79
Kudu Tracing........................................................................................................................................................................79
Cloudera Manager Metrics for Kudu.......................................................................81
Kudu Metrics......................................................................................................................................................81
KuduReplicaMetrics..........................................................................................................................................81
Tablet Server Metrics.........................................................................................................................................92
More Resources for Apache Kudu.........................................................................106
AboutApacheKudu
About Apache Kudu
ApacheKuduisacolumnarstoragemanagerdevelopedfortheHadoopplatform.Kudusharesthecommontechnical
propertiesofHadoopecosystemapplications:Itrunsoncommodityhardware,ishorizontallyscalable,andsupports
highlyavailableoperation.
ApacheKuduisatop-levelprojectintheApacheSoftwareFoundation.
Kudu'sbenefitsinclude:
• FastprocessingofOLAPworkloads.
• IntegrationwithMapReduce,Spark,Flume,andotherHadoopecosystemcomponents.
• TightintegrationwithApacheImpala(incubating),makingitagood,mutablealternativetousingHDFSwithApache
Parquet.
• Strongbutflexibleconsistencymodel,allowingyoutochooseconsistencyrequirementsonaper-requestbasis,
includingtheoptionforstrictserializedconsistency.
• Strongperformanceforrunningsequentialandrandomworkloadssimultaneously.
• EasyadministrationandmanagementthroughClouderaManager.
• Highavailability.TabletServersandMasterusetheRaftconsensusalgorithm,whichensuresavailabilityaslong
asmorereplicasareavailablethanunavailable.Readscanbeservicedbyread-onlyfollowertablets,eveninthe
eventofaleadertabletfailure.
• Structureddatamodel.
Bycombiningalloftheseproperties,Kudutargetssupportforapplicationsthataredifficultorimpossibletoimplement
oncurrentlyavailableHadoopstoragetechnologies.ApplicationsforwhichKuduisaviablesolutioninclude:
• Reportingapplicationswherenewdatamustbeimmediatelyavailableforendusers
• Time-seriesapplicationsthatmustsupportqueriesacrosslargeamountsofhistoricdatawhilesimultaneously
returninggranularqueriesaboutanindividualentity
• Applicationsthatusepredictivemodelstomakereal-timedecisions,withperiodicrefreshesofthepredictive
modelbasedonallhistoricaldata
Formoredetails,seeExampleUseCasesonpage11.
Kudu-ImpalaIntegrationFeatures
• CREATE/ALTER/DROP TABLE-Impalasupportscreating,altering,anddroppingtablesusingKuduasthepersistence
layer.Thetablesfollowthesameinternal/externalapproachasothertablesinImpala,allowingforflexibledata
ingestionandquerying.
• INSERT-DatacanbeinsertedintoKudutablesfromImpalausingthesamemechanismsasanyothertablewith
HDFSorHBasepersistence.
• UPDATE/DELETE-ImpalasupportstheUPDATEandDELETESQLcommandstomodifyexistingdatainaKudu
tablerow-by-roworasabatch.ThesyntaxoftheSQLcommandsischosentobeascompatibleaspossiblewith
existingsolutions.InadditiontosimpleDELETEorUPDATEcommands,youcanspecifycomplexjoinsintheFROM
clauseofthequery,usingthesamesyntaxasaregularSELECTstatement.
• FlexiblePartitioning-SimilartopartitioningoftablesinHive,Kuduallowsyoutodynamicallypre-splittablesby
hashorrangeintoapredefinednumberoftablets,inordertodistributewritesandqueriesevenlyacrossyour
cluster.Youcanpartitionbyanynumberofprimarykeycolumns,withanynumberofhashes,alistofsplitrows,
oracombination.Apartitionschemeisrequired.
• ParallelScan-Toachievethehighestpossibleperformanceonmodernhardware,theKuduclientusedbyImpala
parallelizesscansacrossmultipletablets.
• High-efficiencyqueries-Wherepossible,ImpalapushesdownpredicateevaluationtoKudu,sothatpredicates
areevaluatedascloseaspossibletothedata.QueryperformanceiscomparabletoParquetinmanyworkloads.
8|ApacheKuduUserGuide
AboutApacheKudu
Concepts and Terms
ColumnarDatastore
Kuduisacolumnardatastore.Acolumnardatastorestoresdatainstrongly-typedcolumns.Withaproperdesign,a
columnarstorecanbesuperiorforanalyticalordatawarehousingworkloadsforseveralreasons.
ReadEfficiency
Foranalyticalqueries,youcanreadasinglecolumn,oraportionofthatcolumn,whileignoringothercolumns.This
meansyoucanfulfillyourrequestwhilereadingaminimalnumberofblocksondisk.Witharow-basedstore,you
needtoreadtheentirerow,evenifyouonlyreturnvaluesfromafewcolumns.
DataCompression
Becauseagivencolumncontainsonlyonetypeofdata,pattern-basedcompressioncanbeordersofmagnitude
moreefficientthancompressingmixeddatatypes,whichareusedinrow-basedsolutions.Combinedwiththe
efficienciesofreadingdatafromcolumns,compressionallowsyoutofulfillyourquerywhilereadingevenfewer
blocksfromdisk.
RaftConsensusAlgorithm
TheRaftconsensusalgorithmprovidesawaytoelectaleaderforadistributedclusterfromapoolofpotentialleaders.
Ifafollowercannotreachthecurrentleader,ittransitionsitselftobecomeacandidate.Givenaquorumofvoters,
onecandidateiselectedtobethenewleader,andtheotherstransitionbacktobeingfollowers.Afulldiscussionof
Raftisoutofscopeforthisdocumentation,butitisarobustalgorithm.
KuduusestheRaftConsensusAlgorithmfortheelectionofmastersandleadertablets,aswellasdeterminingthe
successorfailureofagivenwriteoperation.
Table
AtableiswhereyourdataisstoredinKudu.Atablehasaschemaandatotallyorderedprimarykey.Atableissplit
intosegmentscalledtablets,byprimarykey.
Tablet
Atabletisacontiguoussegmentofatable,similartoapartitioninotherdatastorageenginesorrelationaldatabases.
Agiventabletisreplicatedonmultipletabletservers,andatanygivenpointintime,oneofthesereplicasisconsidered
theleadertablet.Anyreplicacanservicereads,andwritesrequireconsensusamongthesetoftabletserversserving
thetablet.
TabletServer
Atabletserverstoresandservestabletstoclients.Foragiventablet,onetabletserveractsasaleaderandtheothers
servefollowerreplicasofthattablet.Onlyleadersservicewriterequests,whileleadersorfollowerseachserviceread
requests.LeadersareelectedusingRaftconsensus.Onetabletservercanservemultipletablets,andonetabletcan
beservedbymultipletabletservers.
Master
Themasterkeepstrackofallthetablets,tabletservers,thecatalogtable,andothermetadatarelatedtothecluster.
Atagivenpointintime,therecanonlybeoneactingmaster(theleader).Ifthecurrentleaderdisappears,anewmaster
iselectedusingRaftconsensus.
Themasteralsocoordinatesmetadataoperationsforclients.Forexample,whencreatinganewtable,theclient
internallysendstherequesttothemaster.Themasterwritesthemetadataforthenewtableintothecatalogtable,
andcoordinatestheprocessofcreatingtabletsonthetabletservers.
Allthemaster'sdataisstoredinatablet,whichcanbereplicatedtoalltheothercandidatemasters.
Tabletserversheartbeattothemasteratasetinterval(thedefaultisoncepersecond).
ApacheKuduUserGuide|9
AboutApacheKudu
CatalogTable
ThecatalogtableisthecentrallocationformetadataofKudu.Itstoresinformationabouttablesandtablets.The
catalogtableisaccessibletoclientsthroughthemaster,usingtheclientAPI.Thecatalogtablemaynotbereador
writtendirectly.Instead,itisaccessibleonlyviametadataoperationsexposedintheclientAPI.Thecatalogtablestores
twocategoriesofmetadata:
ContentsoftheCatalogTable
Tables tableschemas,locations,andstates
Tablets thelistofexistingtablets,whichtabletservershavereplicasofeachtablet,thetablet'scurrent
state,andstartandendkeys.
LogicalReplication
Kudureplicatesoperations,noton-diskdata.Thisisreferredtoaslogicalreplication,asopposedtophysicalreplication.
Thishasseveraladvantages:
• Althoughinsertsandupdatesdotransmitdataoverthenetwork,deletesdonotneedtomoveanydata.The
deleteoperationissenttoeachtabletserver,whichperformsthedeletelocally.
• Physicaloperations,suchascompaction,donotneedtotransmitthedataoverthenetworkinKudu.Thisis
differentfromstoragesystemsthatuseHDFS,wheretheblocksneedtobetransmittedoverthenetworktofulfill
therequirednumberofreplicas.
• Tabletsdonotneedtoperformcompactionsatthesametimeoronthesameschedule,orotherwiseremainin
synconthephysicalstoragelayer.Thisdecreasesthechancesofalltabletserversexperiencinghighlatencyat
thesametime,duetocompactionsorheavywriteloads.
Architectural Overview
ThefollowingdiagramshowsaKuduclusterwiththreemastersandmultipletabletservers,eachservingmultiple
tablets.ItillustrateshowRaftconsensusisusedtoallowforbothleadersandfollowersforboththemastersandtablet
servers.Inaddition,atabletservercanbealeaderforsometablets,andafollowerforothers.Leadersareshownin
gold,whilefollowersareshowninblue.
Figure1:KuduArchitecturalOverview
10|ApacheKuduUserGuide
Description:Tablets are now cleaned up after they are deleted. Issues Fixed in Kudu 0.9.0. • KUDU-678: Fixed a leak that occurred during DiskRowSet compactions where tiny blocks were still written to disk even if there were no REDO records. With the default block manager, this often resulted in block contain