hadoop for sys_admin
DESCRIPTION
A presentation for OhioLinuxFest for Hadoop for System AdministratorsTRANSCRIPT
![Page 1: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/1.jpg)
Justin MillerSenior Systems Engineer/DevOps at iHealth Technologies
Weston BasslerSystems Engineer at Verizon Wireless
Hadoop for System Administrators – Ohio Linux Fest 2014Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 2: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/2.jpg)
What we will be covering:
IntroWhy Hadoop?How Hadoop Works
ArchitecturePlanning Hardware/Storage/NetworkProcessing and Storage HDFS ComponentsYARN Components
OperationsJob schedulingJobs alerts
MonitoringCore ServicesJob scheduler and SLAHardware
Hadoop for System Administrators – Ohio Linux Fest 2014
High AvailabilityYARNHDFSOozie
SecuritySecurity IssuesAuthenticationAuthorizationEncrption
Backup and RecoveryWhat to plan for?How to combat
Hadoop Vendors/DistrosClouderaHortonWorksMapR
![Page 3: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/3.jpg)
Why Hadoop?
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 4: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/4.jpg)
Why Hadoop? Cont...
Sort through TB, even PB worth of data in a matter of minutes
Easily sift through LOGS (patterns, data mining) → switch logs, application logs
Batch Processing
History → Inspired by 2 Google Papers on MapReduce and GoogleFS
Implemented By Yahoo!
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 5: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/5.jpg)
Whose using it?
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 6: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/6.jpg)
How Hadoop?
Processing
• MapReduce (MRv1)What is MapReduce?Nobody likes it
• YARN (MRv2)Yet Another Resource NegotiatorNewer better/versatile2 New Roles → Resource Manager and Application ManagerSpark → New Hotness
• Bringing Processing and Storage togetherData locality → avoid network!“MO NODES MO BETTA”
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 7: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/7.jpg)
YARN in Action
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 8: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/8.jpg)
Storage
• HDFS What is HDFS?Why HDFS?
• Components of HDFSNameNode
Metadata → fsimage + fsedits ZooKeeper → HA management
Quorum based journaling3 JournalNodesActive/Passive NameNode
DataNodes – what do they do?Blocks in relation to NameNode MetadataBlock storage
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 9: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/9.jpg)
HDFS Write Path
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 10: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/10.jpg)
Benefits and Limitations of HDFS
BenefitsLow cost per byte → commodity storage High Bandwidth/Scales effectively → “Mo nodes Mo speed”Rock solid data reliabilitySupports distributed computing I/O patternsOPEN SOURCE!!!!!
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 11: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/11.jpg)
Benefits and Limitations of HDFS (Continued...)
LimitationsUpdates → data is immutable (can't be updated only appended)Write OnceOptimized for sequential reads → not for real-time data processingChallenging import/export → requires additional tooling
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 12: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/12.jpg)
Architectur e
• Planning your Hardware/StorageCheap disks Distributed disk approach → replication factor of 3 for HANO LVM and NO Raid and NO swap noatime, nodiratime
• Network considerationsRack awareness affects data distributionPrefer a faster network when available → 10GB if possible
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 13: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/13.jpg)
Hadoop Operations
• JobsWhat is a job?Scheduling jobs with OozieAlerts on JobsOozie SLAs → Start time, end time & durationFile driven Job Configuration
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 14: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/14.jpg)
Example of a Job:
Example of a coordinator:
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 15: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/15.jpg)
Troubleshooting
• Application → Debug Code
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 16: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/16.jpg)
• Job → Debug Execution
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 17: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/17.jpg)
• Service → Debug Linux Process (/var/log/hadoop-*)
Services wont start → port conflicts (nmap, netstat, lsof)
if not application OR job;do
cat /var/log/hadoop-* | grep ERRORdone
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 18: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/18.jpg)
Monitoring
• Core Services HDFSYARNJMX → JVM Monitoring Cloudera Manager
• PerformanceGanglia (HortonWorks)Cloudera Manager
• Hardware → to each his own (traditional monitoring)SNMPNagiosZenossCloudera Manager
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 19: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/19.jpg)
High Availability
• HDFSZooKeeper → quorum based journaling
• YARNZooKeeper
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 20: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/20.jpg)
• Oozie HA
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 21: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/21.jpg)
Security (Because people are evil)
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 22: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/22.jpg)
Security Continued....
• Known issues – Stupid/Lazy People Hadoop can be very secure
• Authentication - Kerberos Principal (user) Realm (group of principals)Keytab file
• AuthorizationLDAPActive DirectoryRole based
• Encryption – For your eyes Only!Kerberos 1st
SSL Certificates**** SSL must be enabled for all core Hadoop services
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 23: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/23.jpg)
Backup and Recovery – When things go wrong (And they will)
What can go wrong? What to plan for?Data CorruptionNode crashesDisk crashes
Ways to combat when things do go wrong
• Data Corruption checksums of metadata fail → NameNode replaces with freshHDFS → hdfs fsck tool
• Node crashes/Disk crashesHDFS saves the day!NameNode HAFirst 2 replicas of data on different hostsHeartbeat detection
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 24: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/24.jpg)
Hadoop Wars - Vendors and Distributions
• ClouderaSpecializes in Enterprise toolsAuditingAccess ControlCluster Management (Cloudera Manager)
• HortonWorksSpecializes in EngineeringAlso Open SourceTop new cool things
• MapRLead developers begin Mahout
Hadoop for System Administrators – Ohio Linux Fest 2014
![Page 25: Hadoop for sys_admin](https://reader034.vdocuments.net/reader034/viewer/2022051411/547e8481b4af9fd3158b56fa/html5/thumbnails/25.jpg)
Hopefully you enjoyed!
If interested:
Quick Ways to get started Learning Hadoop
• Free Stuff – Who doesn't like free?Big Data University – Hadoop fundamentals, Pig, Oozie, lots moreUdactity – Intro to Hadoop and MapreduceMapR, Cloudera, HortonWorks – Training Videos
Hadoop for System Administrators – Ohio Linux Fest 2014