a continuously deployed hadoop analytics platform?
TRANSCRIPT
![Page 1: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/1.jpg)
ACon&nuouslyDeployedHadoopAnaly&cspla2orm?GrahamGear,Director,SystemsEngineering,APJ
![Page 2: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/2.jpg)
![Page 3: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/3.jpg)
LogicalPilotDeliveryPipeline
Opera&onsMonitor
Provision
Automate
Produc&on
DataScien&sts
Hourly
100%Bugs
Produc&onDataScience
Pre-Produc&onDevelopment
![Page 4: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/4.jpg)
Produc&onWorksta&on
LogicalNascentDeliveryPipeline
DataEngineers
Monthly
0%Bugs
Development UserAcceptanceTest
BackupData
Produc&onWorkload
DataScience
DS,Analysts,Apps
Monthly-Yearly
100%Bugs
Opera&onsMonitor
Provision
Automate
Development Produc&on
GovernanceAudit
Security
Lineage
Pre-Produc&on
![Page 5: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/5.jpg)
SLAWorksta&on
LogicalStagedDeliveryPipeline
DevOps
Weekly
10%Bugs
SystemSmokeTest
DataEngineers
Weekly
0%Bugs
Development
DS,Analysts,Apps
Monthly-Yearly
90%Bugs
Opera&onsMonitor
Provision
Automate
Development Pre-Produc&on Produc&on
GovernanceAudit
Security
Lineage
Produc&onUserAcceptanceTest
BackupData
Produc&onWorkload
DataScience
![Page 6: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/6.jpg)
Non-SLASLAWorksta&on
LogicalManualDeliveryPipeline
DevOps
Weekly
10%Bugs
SystemSmokeTest
DataEngineers
Weekly
0%Bugs
Development UserAcceptanceTest
BackupData
DisasterRecovery
DataScience
DataScien&sts
Weekly-Monthly
60%Bugs
Opera&onsMonitor
Provision
Automate
Development Pre-Produc&on Produc&on
GovernanceAudit
Security
Lineage
SLA
Analysts,Apps
Monthly-Yearly
30%Bugs
Produc&onWorkload
![Page 7: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/7.jpg)
LogicalCon&nuousDeliveryPipeline
Test
Ar&factRepo
Build
AcceptanceTest
ReleaseAr&fact
UnitSuiteTest
BakeAr&fact
DeployPipeline
DevOps
Hourly–Daily
15%Bugs
SystemSmokeTestWorksta&on
SourceRepoReleaseTag
DataEngineers
Hourly
70%Bugs
LightUnitTest
DevelopmentNon-SLA
UserAcceptanceTest
BackupData
DisasterRecovery
DataScience
DataScien&sts
Weekly-Monthly
15%Bugs
AcceptanceTest
Opera&onsMonitor
Provision
Automate
Development Pre-Produc&on Produc&on
GovernanceAudit
Security
Lineage
SLA
Analysts,Apps
Weekly-Monthly
0%Bugs
Produc&onWorkload
![Page 8: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/8.jpg)
SourceRepo
Git
Gerrit
PhysicalCon&nuousDeliveryPipeline
Serial-tenant
<10nodes
Physical,Cloud
Single-tenant
1Laptop,Desktop
Physical
Development Pre-Produc&on Produc&on
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
![Page 9: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/9.jpg)
![Page 10: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/10.jpg)
SourceRepo
Git
Gerrit
DataEngineerDevelopmentPipeline
Serial-tenant
<10nodes
Physical,Cloud
Pre-Produc&on Produc&on
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
Single-tenant
1Laptop,Desktop
Physical
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Development
1. CreateaMavenmodulefromaMavenArchetype,providingabaselineprojectencodingallcorporatestandardsandandtarge&ngaspecificproduc&onversion
2. Developadatasetingestandprepara&onpipelineusingFlume,Kaca,HiveandMapReduceusingEclipseandMaven
3. Buildasuiteofunittestsandsynthe&cdatatoexercisethecodebase,iden&fyingandresolvingbugs
![Page 11: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/11.jpg)
SourceRepo
Git
Gerrit
DataEngineerSourcePipeline
Serial-tenant
<10nodes
Physical,Cloud
Pre-Produc&on Produc&on
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
Single-tenant
1Laptop,Desktop
Physical
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Development
1. ViaMaven,GerritandGitinterac&ons,showdeveloperini&atedprojectsourcecodestages:
• Stage• Review• Commit• Release
![Page 12: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/12.jpg)
SourceRepo
Git
Gerrit
AutomatedBakePipeline
Serial-tenant
<10nodes
Physical,Cloud
Single-tenant
1Laptop,Desktop
Physical
Produc&on
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
1. Simulateanautoma&callytriggeredJenkinsunittestsuite,bakeandsmoketestpipelineagainstaDirectorprovisionedTestclusterservedbyAr&fcatoryandParcelrepositories
Pre-Produc&onDevelopment
![Page 13: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/13.jpg)
AutomatedDeploy&TestPipeline
Serial-tenant
<10nodes
Physical,Cloud
Single-tenant
1Laptop,Desktop
Physical
Development
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
SourceRepo
Git
Gerrit
1. Showdeploy,smokeanduseracceptanceteststages,crea&ngtheopera&onalManagerdashboardsandNavigatormeta-data
Pre-Produc&on Produc&on
![Page 14: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/14.jpg)
DataScien&st&AnalystDevPipeline
Serial-tenant
<10nodes
Physical,Cloud
Single-tenant
1Laptop,Desktop
Physical
Development Pre-Produc&on
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Test
CDHCluster
Synthe&cData
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
SourceRepo
Git
Gerrit
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
1. QuerydatasetusingImpalaviaHue,captureSQLlogsandfeedthembackintoOp&mizerandproject,showdependencyverifica&onunderschemaevolu&on
2. AnalysedatasetusingPythonandIbisviatheDSWorkbenchapplica&onfeedingbackintoproject,showdependencycheckingduringdatasetevolu&on
Produc&on
![Page 15: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/15.jpg)
SourceRepo
Git
Gerrit
Applica&onDeliveryPipeline
Serial-tenant
<10nodes
Physical,Cloud
Single-tenant
1Laptop,Desktop
Physical
Development Pre-Produc&on Produc&on
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
1. Applica&onrevpipeline,showcomparisontopreviousversion
![Page 16: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/16.jpg)
Pla2ormDeliveryPipeline
Serial-tenant
<10nodes
Physical,Cloud
Development Pre-Produc&on Produc&on
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
Single-tenant
1Laptop,Desktop
Physical
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
SourceRepo
Git
Gerrit
1. Pla2ormrevpipeline,showcomparisontopreviousversion
![Page 17: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/17.jpg)
LogicalCon&nuousDeliveryPipeline
Test
Ar&factRepo
Build
AcceptanceTest
ReleaseAr&fact
UnitSuiteTest
BakeAr&fact
DeployPipeline
DevOps
Hourly–Daily
15%Bugs
SystemSmokeTestWorksta&on
SourceRepoReleaseTag
DataEngineers
Hourly
70%Bugs
LightUnitTest
DevelopmentNon-SLA
UserAcceptanceTest
BackupData
DisasterRecovery
DataScience
DataScien&sts
Weekly-Monthly
15%Bugs
AcceptanceTest
Opera&onsMonitor
Provision
Automate
Development Pre-Produc&on Produc&on
GovernanceAudit
Security
Lineage
SLA
Analysts,Apps
Weekly-Monthly
0%Bugs
Produc&onWorkload
![Page 18: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/18.jpg)
![Page 19: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/19.jpg)
![Page 20: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/20.jpg)
![Page 21: A Continuously Deployed Hadoop Analytics Platform?](https://reader034.vdocuments.net/reader034/viewer/2022042907/586fde6f1a28ab18428b6bc5/html5/thumbnails/21.jpg)
Ques&ons?• Cloudera Framework Example
• https://github.com/ggear/cloudera-framework
• Cloudera Parcel Maven Plugin • https://github.com/ggear/cloudera-parcel
• Cloudera Manager API • https://cloudera.github.io/cm_api/apidocs/v12/index.html
• Cloudera Navigator API • http://cloudera.github.io/navigator/apidocs/v3
• Cloudera Director • https://director.cloudera.com
• Cloudera Optimizer • https://optimizer.cloudera.com [email protected]