map reduce algorithms › cs590d › lectures › presentation8.pdf · map reduce algorithms david...

Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna Saha

Upload: others

Post on 05-Jul-2020

9 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Map Reduce Algorithms

David Wemhoener

Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna Saha

Page 3: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

Page 4: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

Example

Page 5: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

Example

Page 6: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

Word Counting

map(String key, String value): // key: document name, // value: document contents

for each word w in value: EmitIntermediate(w, "1");

reduce(String key, Iterator values):// key: a word // values: a list of counts

int word_count = 0;for each v in values:word_count += ParseInt(v);

Emit(key, AsString(word_count));

MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004

Page 7: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

Word Counting

• Have a very large document

Page 8: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

What else can we do?

• Reverse Web-Link Graph

MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004

Page 9: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

What else can we do?

• Reverse Web-Link Graph

– Map Function: source → <target, source> pairs

MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004

Page 10: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

What else can we do?

• Reverse Web-Link Graph

– Map Function: source → <target, source> pairs

– Reduce Function: <target, source> pairs → <target, list(source)>

MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004

Page 11: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

What else can we do?

• Reverse Web-Link Graph

– Map Function: source → <target, source> pairs

– Reduce Function: <target, source> pairs → <target, list(source)>

• Inverted Index

MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004

Page 12: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

What else can we do?

• Reverse Web-Link Graph

– Map Function: source → <target, source> pairs

– Reduce Function: <target, source> pairs → <target, list(source)>

• Inverted Index

– Map Function: document → <word, doc ID> pairs

MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004

Page 13: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

What else can we do?

• Reverse Web-Link Graph

– Map Function: source → <target, source> pairs

– Reduce Function: <target, source> pairs → <target, list(source)>

• Inverted Index

– Map Function: document → <word, doc ID> pairs

– Reduce Function: <word, doc ID> pairs → <word, list(doc ID)>

MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004

Page 14: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

Example: Host size

• Suppose we have a large web corpus

• Look at the metadata file

– Lines of the form: (URL, size, date, …)

• For each host, find the total number of bytes

– That is, the sum of the page sizes for all URLs from that particular host

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Page 15: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

Example: Join By Map-Reduce• Compute the natural join R(A,B) ⋈ S(B,C)

• R and S are each stored in files

• Tuples are pairs (a,b) or (b,c)

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org15

A B

a1 b1

a2 b1

a3 b2

a4 b3

B C

b2 c1

b2 c2

b3 c3

⋈A C

a3 c1

a3 c2

a4 c3

Page 16: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

Map-Reduce Join

• A Map process turns:

– Each input tuple R(a,b) into key-value pair (b,(a,R))

– Each input tuple S(b,c) into (b,(c,S))

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org16

Page 17: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

Map-Reduce Join

• A Map process turns:

– Each input tuple R(a,b) into key-value pair (b,(a,R))

– Each input tuple S(b,c) into (b,(c,S))

• Map processes send each key-value pair with key b to Reduce process h(b)

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org17

Page 18: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

Map-Reduce Join

• A Map process turns:

– Each input tuple R(a,b) into key-value pair (b,(a,R))

– Each input tuple S(b,c) into (b,(c,S))

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org18

Page 19: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

Example

Page 20: Map Reduce Algorithms › cs590d › lectures › Presentation8.pdf · Map Reduce Algorithms David Wemhoener Acknowledgement: Some slides are taken from Sergei Vassilivski, Barna

Example

Befriend SAS® In-Database Technologies to Accelerate SAS ...€¦ · reduce data (de-duplicate, reduce the number of variables) transpose data custom scoring algorithms analytical

Map Reduce Algorithms - WordPress.com · Map Reduce Algorithms ... Sergei Vassilivski’s tutorial on MapReduce Example Example A Sense of Scale At web scales ... Example: A …

Map Reduce - cis.csuohio.educis.csuohio.edu/~sschung/cis612/MapReduceJoinAlgorithms.pdf · Map Reduce Basic Map Reduce Join Algorithms Improvements in performance by pre processing

Preliminary Results on Using Matching Algorithms in Map-reduce Application

terms Creative Professionalscompression algorithms reduce file size while preserving a perfect copy of the original uncompressed image. Lossy compression algorithms on the other hand

Map Reduce Algorithms book

Event Mining: Algorithms and Applications...Optimization 1. Reduce False positive (false alerts) 2. Reduce False negative (missed alerts) (5) Ticket recommendation Introduction and

Investor Presentation - axmomentum.com algorithms designed to reduce ... • Kelly Criterion seeks to maximize the expected utility of an investment given ... Investor Presentation

Advanced Topics Niching Genetic Algorithms Multi-Objective ... · Niching Genetic Algorithms: The Idea • Niching methods have been developed to reduce the effect of genetic drift

DFA minimization algorithms in map reduce

Algorithms and Methods for High-Performance Model ...Summary (English) The goal of this thesis is to investigate algorithms and methods to reduce the solution time of solvers for Model

Algorithms and Applications in Social Networks · •Apache Giraph •GraphLab •Pegasus •MapReduce 13. The Map/Reduce Approach 14. Map Reduce •A programming model for large-scale,

VTK-m Project Goals A single place for the visualization community to collaborate, contribute, and leverage massively threaded algorithms. Reduce the challenges

FFT Algorithms and ArchitecturesFFT Architectures Examples DIF vs. DIT Decomposition. Fast Fourier Transform (FFT) Several fast Fourier transform algorithms have been proposed to reduce

Journal Title Resilient Gossip-Inspired All-Reduce ...eprints.cs.univie.ac.at/5954/1/gossipHPC_finaldraft.pdfbased all-to-all reduction algorithms for such systems in Section2. In

Configuration Mapping Algorithms to Reduce Energy and Time ...eprints.ucm.es/31332/1/Configuration....pdf · 1 Configuration Mapping Algorithms to Reduce Energy and Time Reconfiguration

The Structure of Dense Polymer Systems: Geometry ... · search algorithms. The algorithms work at the target density from the start. They reduce intersections and consider statistical

COMBINATION OF OPTIMISATION ALGORITHMS FOR · PDF fileCOMBINATION OF OPTIMISATION ALGORITHMS FOR A MULTI-OBJECTIVE BUILDING DESIGN PROBLEM Mohamed Hamdy, ... RF) attempting to reduce

GreenTouch Green Meter Research Study: Reducing the Net ... · possible through the combination of technologies, architectures, components, algorithms and protocols to reduce the

Algorithms for Natural Language Processing Lexical ...people.cs.georgetown.edu/nschneid/cosc272/f17/04_lexsem_slides.pdf · Did Poland reduce its carbon emissions since 1989?

DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17, 2014 1

Heatpump conceptsfor NearlyZero EnergyBuildings · 2013. 4. 10. · C. Wemhoener, IEA HPP Annex 40, SHC Task definition workshop, Paris 2013. Titelmasterformat durch Klicken bearbeiten

Algorithms: Greedy Algorithms

An adaptive distributed simulator for cloud andmap reduce algorithms and architectures

MapReduce in Spark€¦ · 4 Algorithms in Map-Reduce 5 Summary 3/54. Outline 1 Motivation 2 MapReduce 3 Spark 4 Algorithms in Map-Reduce 5 Summary 4/54. Motivation Traditional DBMS

Join Algorithms using Map/Reduce - School of Informatics · Join Algorithms using Map/Reduce Jairam Chandar T H E U NIVE R S I T Y O F E ... 4.2 Experiment 1 : ... Database Management

ADAPTIVE ALGORITHMS FOR MIMO ACOUSTIC ECHO CANCELLATION · Adaptive Algorithms for MIMO Acoustic Echo Cancellation 3 that the only solution to the nonuniqueness problem is to reduce

DFA Minimization Algorithms in Map-Reduce · 2016-01-19 · DFA Minimization Algorithms in Map-Reduce Iraj Hedayati Somarin Map-Reduce has been a highly popular parallel-distributed

Evaluating map reduce tasks scheduling algorithms over ...qmyaseen/mapreduce1.pdf · EVALUATING MAP REDUCE TASKS SCHEDULING ALGORITHMS environment. The evaluation process is conducted

Page Replacement Algorithms - Computer Science€¦ · Page Replacement Algorithms Concept ... Ø A. Make life easier for OS implementer Ø B. Reduce the number of page faults Ø

Improved Lower and Upper Bound Algorithms for Pricing ...€¦ · algorithms which improve exercise policies and reduce the variance. 1.2 Brief results The improvements developed

How Social Media Algorithms Work - Sysomossocial media algorithms, native content creation is a hallmark of authentic community activity. The quest to reduce levels of social media

Parallel Sequences: Fold, Reduce and Scan · Fold and Reduce We have seen many sequential algorithms can be programmed succintly using fold or reduce. Eg: sum all elements: 7 4 3

MULTIDITHER ADAPTIVE ALGORITHMSadaptive techniques (COAT) is to analyze and experimentally demonstrate adaptive multidither correction algorithms that can reduce beam dis- tortions

Machine learning algorithms for systematic review ... · (ML) algorithms to aid systematic review is becoming an increasingly popular approach to reduce human bur-den and monetary