Transcript
Page 1: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

Hadoop-PigProcessing of large data

Yevgen SmertenkoEngineering Team Lead. BI Developer.

Page 2: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

How it worksBI engineerclear result

data

PigPig

Page 3: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

Hadoop - Software Framework

Provide Massive Parallel Processing (MPP) of data

MapReduce program• Input read• Map• Partition / Combine• Copy / Compare / Merge• Reduce• Output write

Page 4: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

MapReduce Data Flow

Page 5: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

MapReduce Data Flow

Page 6: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

MapReduce functionality

Page 7: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

The Hadoop Ecosystem

Page 8: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

PIG

• Data types• Relational operators• UDF – user defined functions

Pig Latin - language of the data streams description

Page 9: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

Pig. Data Types

Simple Types• int• long• float• double• chararray• bytearray• boolean• datetime

Complex Types• tuple (.., ..)• map [key#value]• bag {(), .., ()}

Page 10: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

Pig. Relational operators

• SPLIT• UNION• FILTER• DISTINCT• SAMPLE• FOREACH• STREAM

• JOIN• GROUP / COGROUP• CROSS• ORDER

• LOAD• STORE

Page 11: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

PIG. UDF

Eval Functions (EvalFunc) • Filter Functions • Aggregate Functions• Algebraic Interface• Accumulator Interface

Load/Store Functions (StoreFunc)

piggybank

Page 12: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

How it worksBI engineerclear result

data

Pig

Page 13: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

THANKS FOR YOUR ATTENTION!


Top Related