apache hadoop yarn - hortonworks meetup presentation
TRANSCRIPT
![Page 1: Apache Hadoop YARN - Hortonworks Meetup Presentation](https://reader031.vdocuments.net/reader031/viewer/2022013118/5562636ed8b42ae87d8b4ea0/html5/thumbnails/1.jpg)
Apache Hadoop YARN
Page 1
![Page 2: Apache Hadoop YARN - Hortonworks Meetup Presentation](https://reader031.vdocuments.net/reader031/viewer/2022013118/5562636ed8b42ae87d8b4ea0/html5/thumbnails/2.jpg)
A Cursory Look At The Architecture
© Hortonworks Inc. 2012. Confidential and Proprietary. Page 2
![Page 3: Apache Hadoop YARN - Hortonworks Meetup Presentation](https://reader031.vdocuments.net/reader031/viewer/2022013118/5562636ed8b42ae87d8b4ea0/html5/thumbnails/3.jpg)
Global Scheduler (ResourceManager)
Page 3
• Pure resource arbitration • Multiple resource dimensions
–<priority, data-locality, memory, cpu, …>
• In-built support for data-locality –Node, Rack etc.– Unique to YARN
© Hortonworks Inc. 2012. Confidential and Proprietary.
![Page 4: Apache Hadoop YARN - Hortonworks Meetup Presentation](https://reader031.vdocuments.net/reader031/viewer/2022013118/5562636ed8b42ae87d8b4ea0/html5/thumbnails/4.jpg)
Scheduler Concepts
Page 4
• Input from AM(s) is a dynamic list of ResourceRequests –<resource-name, resource-capability>– Resource name: (hostname / rackname / any)– Resource capability: (memory, cpu, …) – Essentially an inverted <name, capability> request map from AM to
RM– No notion of tasks!
• Output - Container–Resource(s) grant on a specific machine–Verifiable grant
© Hortonworks Inc. 2012. Confidential and Proprietary.
![Page 5: Apache Hadoop YARN - Hortonworks Meetup Presentation](https://reader031.vdocuments.net/reader031/viewer/2022013118/5562636ed8b42ae87d8b4ea0/html5/thumbnails/5.jpg)
Scheduling Walkthrough
Page 5
MapReduce job with 2 maps and 1 reduce
© Hortonworks Inc. 2012. Confidential and Proprietary.
![Page 6: Apache Hadoop YARN - Hortonworks Meetup Presentation](https://reader031.vdocuments.net/reader031/viewer/2022013118/5562636ed8b42ae87d8b4ea0/html5/thumbnails/6.jpg)
Scheduling Walkthrough
Page 6
Container allocation on r22/h2121:
© Hortonworks Inc. 2012. Confidential and Proprietary.
![Page 7: Apache Hadoop YARN - Hortonworks Meetup Presentation](https://reader031.vdocuments.net/reader031/viewer/2022013118/5562636ed8b42ae87d8b4ea0/html5/thumbnails/7.jpg)
Scheduling Walkthrough
Page 7
Container allocation on r11/h1010:
© Hortonworks Inc. 2012. Confidential and Proprietary.
![Page 8: Apache Hadoop YARN - Hortonworks Meetup Presentation](https://reader031.vdocuments.net/reader031/viewer/2022013118/5562636ed8b42ae87d8b4ea0/html5/thumbnails/8.jpg)
Writing Custom Applications
Page 8
• Grand total of 3 protocols–ClientRMProtocol
– Application launching program– submitApplication
–AMRMProtocol– Protocol between AM & RM for resource allocation– registerApplication / allocate / finishApplication
–ContainerManagerProtocol– Protocol between AM & NM for container start/stop– startContainer / stopContainer
© Hortonworks Inc. 2012. Confidential and Proprietary.
![Page 9: Apache Hadoop YARN - Hortonworks Meetup Presentation](https://reader031.vdocuments.net/reader031/viewer/2022013118/5562636ed8b42ae87d8b4ea0/html5/thumbnails/9.jpg)
© Hortonworks Inc. 2012
API improvements
• Overload of the ‘*’ entry.• Release / reject containers• Ask for specific nodes/racks (only)• Don’t give me containers on this racks/nodes• Single client thread allowed to request containers• Overloaded allocate call
Page 9
![Page 10: Apache Hadoop YARN - Hortonworks Meetup Presentation](https://reader031.vdocuments.net/reader031/viewer/2022013118/5562636ed8b42ae87d8b4ea0/html5/thumbnails/10.jpg)
© Hortonworks Inc. 2012
Recent advancements
• Tools for debugging AMs–Unmanaged AM
• Generic AM – Utility libraries for writing –YARN-103, YARN-29
• YARN project split and how multiple versions of MapReduce can coexist.
Page 10
![Page 11: Apache Hadoop YARN - Hortonworks Meetup Presentation](https://reader031.vdocuments.net/reader031/viewer/2022013118/5562636ed8b42ae87d8b4ea0/html5/thumbnails/11.jpg)
© Hortonworks Inc. 2012
Roadmap
• MapReduce container reuse• RM restart capability• Multi-resource scheduling• Generic application history server
Page 11
![Page 12: Apache Hadoop YARN - Hortonworks Meetup Presentation](https://reader031.vdocuments.net/reader031/viewer/2022013118/5562636ed8b42ae87d8b4ea0/html5/thumbnails/12.jpg)
Questions?
Page 12
Thank You!
© Hortonworks Inc. 2012. Confidential and Proprietary.