hadoop (pig). processing of large data (by eugene smertenko) - big data tech hangout - 2013.10.26

13
Hadoop-Pig Processing of large data Yevgen Smertenko Engineering Team Lead. BI Developer.

Upload: innovecs

Post on 15-Jan-2015

461 views

Category:

Education


2 download

DESCRIPTION

On Saturday, 26 of October, the second external meeting of Tech Hangout Community took place in Creative Space 12, the cultural and educational center based in Kiev! The event was held under the motto «Discover the value of Big Data!» * Tech Hangout -- an event, organized by the developers for the developers for knowledge and experience sharing. The concept of the event proposes a 30-minute report on the topic previously defined, and the discussion of the same duration in a roundtable session format. This initiative has proved to be so popular and high-demand that Tech Hangout own logo, blog and group on Facebook with the opportunity to discuss information heard have been created in a short period of time. Join to discuss - https://www.facebook.com/groups/techhangout/ Read us - http://hangout.innovecs.com/

TRANSCRIPT

Page 1: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

Hadoop-PigProcessing of large data

Yevgen SmertenkoEngineering Team Lead. BI Developer.

Page 2: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

How it worksBI engineerclear result

data

PigPig

Page 3: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

Hadoop - Software Framework

Provide Massive Parallel Processing (MPP) of data

MapReduce program• Input read• Map• Partition / Combine• Copy / Compare / Merge• Reduce• Output write

Page 4: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

MapReduce Data Flow

Page 5: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

MapReduce Data Flow

Page 6: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

MapReduce functionality

Page 7: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

The Hadoop Ecosystem

Page 8: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

PIG

• Data types• Relational operators• UDF – user defined functions

Pig Latin - language of the data streams description

Page 9: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

Pig. Data Types

Simple Types• int• long• float• double• chararray• bytearray• boolean• datetime

Complex Types• tuple (.., ..)• map [key#value]• bag {(), .., ()}

Page 10: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

Pig. Relational operators

• SPLIT• UNION• FILTER• DISTINCT• SAMPLE• FOREACH• STREAM

• JOIN• GROUP / COGROUP• CROSS• ORDER

• LOAD• STORE

Page 11: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

PIG. UDF

Eval Functions (EvalFunc) • Filter Functions • Aggregate Functions• Algebraic Interface• Accumulator Interface

Load/Store Functions (StoreFunc)

piggybank

Page 12: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

How it worksBI engineerclear result

data

Pig

Page 13: Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26

THANKS FOR YOUR ATTENTION!