apache pig

47
Welcome to BigData Cloud Architects Meetup Group Weekly, Sunday 6:00p - 8:00p Exciting Topic Every Week, Please Register @ http://www.meetup.com/BigData-Cloud-Meetup/ 6:00 - 6:15 : Socializing 6:15 - 6:30 : Meetup Introduction 6:30 - 7:55 : Topic of the week 7:55 - 8:00 : Wrap UP Started in Jun 2013 33 weeks - 3 weeks = 30 weeks

Upload: vdesibabu

Post on 30-Jun-2015

204 views

Category:

Data & Analytics


2 download

DESCRIPTION

Apache PIG : User Group Presentation

TRANSCRIPT

Page 1: Apache pig

Welcome to !BigData Cloud Architects Meetup Group

Weekly, Sunday 6:00p - 8:00p

Exciting Topic Every Week, Please Register @ !http://www.meetup.com/BigData-Cloud-Meetup/

6:00 - 6:15 : Socializing!6:15 - 6:30 : Meetup Introduction!

6:30 - 7:55 : Topic of the week!7:55 - 8:00 : Wrap UP

Started in Jun 2013!33 weeks - 3 weeks = 30 weeks

Page 2: Apache pig

Suresh Mandava!!

CyberSecurity Cloud!Principal, CSC!

!

!

!24+ Years of IT Experience in Infrastructure, Platform and Data

Management in Mission Critical Enterprise Environments

Founder/CTO for Network Security and a Cloud IaaS Product Companies!

Page 3: Apache pig

Apache Accumulo

Page 4: Apache pig

Market NeedBigData Architect

Must-have Technical skills and Experience: !•! 5-7 years of strong system design/development experience in building massively large scale distributed systems and products. !•! 1 to 2 years of good experience in working with Hadoop and Big Data technologies including core Hadoop (HDFS and MapReduce), Flume and Sqoop, HBase and Zookeeper, Oozie, Hive and Pig.

•! Troubleshooting: The candidate must be able to engage in solving complex problems. Programming problems are a good example.

•! Databases: Advanced SQL knowledge a must. Good to have advanced data warehousing and MPP knowledge but it's not a must have.

•! Linux/Unix and system administration: Advanced Linux knowledge is a must. Understanding of shell, debugging things etc. The candidate should be able to get their way around Linux and get things to work.

•! Networking and Hardware: Advanced networking and hardware knowledge required. !•! Programming: Strong in one programming language. Java would be preferable. If not Java, the candidate must have a good handle on one language and display the ability to pick up a new one if required. Experience on programming infrastructure management or automation tools is required. !•! Big picture / High level architecture: The candidate must be able to think at a high level about the overall systems and goals of the projects.

Jan 9, 2014

Page 5: Apache pig

Hadoop 2.0 GA Release Oct 16, 2013

Page 6: Apache pig

Apache PIGSuresh Mandava

Jan 19, 2014 : Session 30

Page 7: Apache pig

Handle large data volume Run queries spanning days/months GB/TB/PBs

Structured, Semi and Unstructured data Computationally intensive

Deep analytics Machine learning algorithms

Hadoop Cluster

Page 8: Apache pig

MapReduce (Computation)

+

HDFS (Storage)

Page 9: Apache pig
Page 10: Apache pig
Page 11: Apache pig

But, writing MapReduce!jobs in Java is painful. Let’s see why …!

Page 12: Apache pig
Page 13: Apache pig

Java

Page 14: Apache pig

and

Page 15: Apache pig

PIG———-Start ——— A = load ’/app_logs/2012/01/*/' using PigStorage(); !uLogs = FILTER A BY $0 == ’U'; !uLogFields = FOREACH uLogs GENERATE $1 as orgId,  $2 as userId, !orgUserGroup = GROUP uLogFields BY (orgId, userId); !uCount = FOREACH orgUserGroup GENERATE group, COUNT(uLogFields); !STORE uCount INTO ‘output’; ———— END ———-

Page 16: Apache pig

Performance of Pig vs MapReduce

Page 17: Apache pig

PIG Philosophy ?

Page 18: Apache pig

Pigs Eat Everything

• Pig can operate on data whether it has metadata or not. It can operate on data that is relational, nested, or unstructured. And it can easily be extended to operate on data beyond files, including key/value stores, databases, etc.

Page 19: Apache pig

Pigs Live Everywhere

• Pig is intended to be a language for parallel data processing. It is not tied to one particular parallel framework. It has been implemented first on Hadoop, but we do not intend that to be only on Hadoop.

Page 20: Apache pig

Pigs are Domestic Animals

• Pig is designed to be easily controlled and modified by its users.

• Pig allows integration of user code where ever possible, so it currently supports user defined field transformation functions, user defined aggregates, and user defined conditionals. These functions can be written in Java or scripting languages that can compile down to Java (e.g. Jython).

• Pig has an optimizer that rearranges some operations in Pig Latin scripts to give better performance, combines Map Reduce jobs together, etc. However, users can easily turn this optimizer off to prevent it from making changes that do not make sense in their situation.

Page 21: Apache pig

Pigs Fly

• Pig processes data quickly. We want to consistently improve performance, and not implement features in ways that weigh pig down so it can't fly.

Page 22: Apache pig

About PIG

Large Code Contributions from : Yahoo, Twitter, HortonWorks

Page 23: Apache pig

PIG Execution Stages

Page 24: Apache pig

PIG VS SQL

Page 25: Apache pig

PIG PROCEDURAL !VS!

SQL DECLARATIVE

Page 26: Apache pig

FEATURES OF PIG

Page 27: Apache pig

PIG : Configure

Page 28: Apache pig

PIG : Data Types

Page 29: Apache pig

PIG : Run

Page 30: Apache pig

PIG : Latin Constructs

Page 31: Apache pig

Sample Data

Page 32: Apache pig

Employee > 100K salary

Page 33: Apache pig

More : Pig Latin

Page 34: Apache pig

More : Pig Latin

Page 35: Apache pig

More : Pig Latin

Page 36: Apache pig

Is that ALL ?

PIG is Extremely Flexible

Page 37: Apache pig

PigLatin : UDF

Page 38: Apache pig

UDF : Types

Page 39: Apache pig

UDF Library

Page 40: Apache pig

UDF : How to write (EvalFunc)

Page 41: Apache pig

UDF : How to use custom UDF in PIG

Page 42: Apache pig

UDF : How to execute pig script with custom UDF

Page 44: Apache pig

Our Goal

Master the Eco-System one bite at a time, every week

Page 45: Apache pig

Appreciate you Feedback!@ Group Reviews

Thank You.

Page 46: Apache pig

www.bigdatapro.io

Page 47: Apache pig

Thank You and Have a Nice Week!!

Meet you all Next Week