vgu bis2010 mapreduce and batch processing
TRANSCRIPT
![Page 1: Vgu bis2010 Mapreduce and Batch processing](https://reader036.vdocuments.net/reader036/viewer/2022062703/554ca9d3b4c90536578b47a6/html5/thumbnails/1.jpg)
MapReduce and Batch Processing
VGU BIS2010, Group 13
Son Pham: [email protected] |
Phong Le: [email protected] |
Lam Pham: [email protected] |
Chuong Nguyen: [email protected] |
Chapter 4
![Page 2: Vgu bis2010 Mapreduce and Batch processing](https://reader036.vdocuments.net/reader036/viewer/2022062703/554ca9d3b4c90536578b47a6/html5/thumbnails/2.jpg)
Content
Part 1: Son Pham
Batch Layer <
Part 2: Phong Le
> MapReduce
Part 3: Lam Pham
MapReduce <
Part 4: Chuong Nguyen
> Demo
![Page 3: Vgu bis2010 Mapreduce and Batch processing](https://reader036.vdocuments.net/reader036/viewer/2022062703/554ca9d3b4c90536578b47a6/html5/thumbnails/3.jpg)
Batch Layer
Lambda Architecture
![Page 4: Vgu bis2010 Mapreduce and Batch processing](https://reader036.vdocuments.net/reader036/viewer/2022062703/554ca9d3b4c90536578b47a6/html5/thumbnails/4.jpg)
Batch Layer
• Precomputation• High latency• Linearly Scalable
![Page 5: Vgu bis2010 Mapreduce and Batch processing](https://reader036.vdocuments.net/reader036/viewer/2022062703/554ca9d3b4c90536578b47a6/html5/thumbnails/5.jpg)
Batch Layer
On-the-fly computation:
Precomputation:
![Page 6: Vgu bis2010 Mapreduce and Batch processing](https://reader036.vdocuments.net/reader036/viewer/2022062703/554ca9d3b4c90536578b47a6/html5/thumbnails/6.jpg)
Batch Layer – Linear Scalability
“Scalability is the ability of a system to maintain performance under increased
load by adding more resources”
![Page 7: Vgu bis2010 Mapreduce and Batch processing](https://reader036.vdocuments.net/reader036/viewer/2022062703/554ca9d3b4c90536578b47a6/html5/thumbnails/7.jpg)
Linear vs. Non-Linear Scalability
Linear Scalability Non- Linear Scalability
“A linearly scalable system can maintain performance under increasedload by adding resources in proportion to the increased load”
![Page 8: Vgu bis2010 Mapreduce and Batch processing](https://reader036.vdocuments.net/reader036/viewer/2022062703/554ca9d3b4c90536578b47a6/html5/thumbnails/8.jpg)
MapReduce
A distributed computing paradigm originally pioneered by Google
Inspired by the “Map” and “Reduce” functions commonly used in functional programming (LISP)
Operating on data stored in a distributed filesystem (HDFS…)
A population free implementation is Apache Hadoop.
![Page 9: Vgu bis2010 Mapreduce and Batch processing](https://reader036.vdocuments.net/reader036/viewer/2022062703/554ca9d3b4c90536578b47a6/html5/thumbnails/9.jpg)
MapReduce
![Page 10: Vgu bis2010 Mapreduce and Batch processing](https://reader036.vdocuments.net/reader036/viewer/2022062703/554ca9d3b4c90536578b47a6/html5/thumbnails/10.jpg)
MapReduce - “Word count” Example
![Page 11: Vgu bis2010 Mapreduce and Batch processing](https://reader036.vdocuments.net/reader036/viewer/2022062703/554ca9d3b4c90536578b47a6/html5/thumbnails/11.jpg)
MapReduceScalability
Automatically parallelize the computation across the cluster of machines
Fault-ToleranceReassign failed tasks
![Page 12: Vgu bis2010 Mapreduce and Batch processing](https://reader036.vdocuments.net/reader036/viewer/2022062703/554ca9d3b4c90536578b47a6/html5/thumbnails/12.jpg)
Q&A
THANK YOU