introduction to pig | pig architecture | pig fundamentals
of 25
/25
Slide 1 © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data Analytics using Pig
Embed Size (px)
TRANSCRIPT
- 1. Slide 1 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data Analytics using Pig
- 2. Slide 2 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Scope of PPT BIG Data Analytics via PIG Introduction to Big Data and Hadoop Introduction to Pig Hadoop Pig Architecture BIG Data Analytics via Pig BIG Data & Hadoop Job Trends BIG Data & Hadoop Course Syllabus Get Started with BIG Data & Hadoop
- 3. Slide 3 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data and its Challenges Get Started with BIG Data & Hadoop
- 4. Slide 4 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data and its Challenges Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information Its very difficult to manage such huge data Get Started with BIG Data & Hadoop
- 5. Slide 5 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Who Generates Big Data? Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data? Today, it is becoming a problem for all of us to manage such BIG DATA. Get Started with BIG Data & Hadoop
- 6. Slide 6 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop can be used for easy processing of such huge Data.. We will answer how? Before that lets understand what is Hadoop? Get Started with BIG Data & Hadoop
- 7. Slide 7 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop and its Characteristics Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model It is an Open-source Data Management technology with scale-out storage and distributed processing Hadoop Characteristics Flexible Reliable Economical Scalable Get Started with BIG Data & Hadoop
- 8. Slide 8 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Flume Sqoop Import Or Export Unstructured or Semi-Structured data Structured Data Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Other YARN Frameworks (MPI, GIRAPH) YARN Cluster Resource Management Hadoop Ecosystem Get Started with BIG Data & Hadoop
- 9. 2015 Blue Camphor Technologies (P) Ltd. Slide 9 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Need for Pig Java is not a preferred language for many data analysts 200 Java LOC ~ 10 Pig LOC Many built-in operations are available for common data operations like join, grouping, filtering etc. Get Started with BIG Data & Hadoop
- 10. 2015 Blue Camphor Technologies (P) Ltd. Slide 10 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Where to use Pig? Pig is a Data Flow language, thus it is most suitable for: Quickly changing data processing requirements Processing data from multiple channels Quick hypothesis testing Time sensitive data refreshes Data profiling using sampling Get Started with BIG Data & Hadoop
- 11. 2015 Blue Camphor Technologies (P) Ltd. Slide 11 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com What is Pig? It is an open source data flow language Pig Latin is used to express the queries and data manipulation operations in simple scripts Pig converts the scripts into a sequence of underlying Map Reduce jobs Get Started with BIG Data & Hadoop
- 12. 2015 Blue Camphor Technologies (P) Ltd. Slide 12 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Lets internalize Pig Lets find out people who overall visit highly ranked pages User URL Time John www.cbn.com 7:00 John www.trap.com 7:05 John www.myblog.com 9:00 John www.flickr.com 9:05 Linda cnn.com/index.htm 11:00 Visits Page URL Page Rank www.cbn.com 0.9 www.flickr.com 0.9 www.myblog.com 0.6 www.trap.com 0.3 Pages Get Started with BIG Data & Hadoop
- 13. 2015 Blue Camphor Technologies (P) Ltd. Slide 13 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Internalizing Pig Join url = url Load Visits (user, url, time) Load Pages (url, pagerank) Group by User Compute Average Pagerank Group by User Get Started with BIG Data & Hadoop
- 14. 2015 Blue Camphor Technologies (P) Ltd. Slide 14 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Pig in Industry Since Pig is a data flow language, it naturally suits for: Data factory operations Typically data is brought from multiple servers to HDFS Pig is used for cleaning the data and preprocessing it It helps data analysts and researchers for quickly prototyping their theories Since Pig is extensible, it becomes way easier for data analysts to spawn their scripting language programs (like Ruby, Python programs) effectively against large data sets Get Started with BIG Data & Hadoop
- 15. 2015 Blue Camphor Technologies (P) Ltd. Slide 15 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Ways to Handle Pig Grunt Mode: Its interactive mode of Pig Very useful for testing syntax checking and ad-hoc data exploration Script Mode: Runs set of instructions from a file Similar to a SQL script file Embedded Mode: Executes Pig programs from a Java program Suitable to create Pig Scripts on the fly Script Grunt Embedded Get Started with BIG Data & Hadoop
- 16. 2015 Blue Camphor Technologies (P) Ltd. Slide 16 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Modes of Pig All of the different Pig invocations can run in the following modes: Local In this mode, entire Pig job runs as a single JVM process Picks and stores data from local Linux path Map Reduce In this mode, Pig job runs as a series of map reduce jobs Input and output paths are assumed as HDFS paths Get Started with BIG Data & Hadoop
- 17. 2015 Blue Camphor Technologies (P) Ltd. Slide 17 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Pig Components Pig Data Flows Pig Latin is used to express data flows Execution Environments Distributed execution on a Hadoop Cluster Local execution in a single JVM 1. 2. Get Started with BIG Data & Hadoop
- 18. 2015 Blue Camphor Technologies (P) Ltd. Slide 18 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Pig is just a wrapper on top of Map Reduce layer It parses, optimizes and converts the Pig script to a series of Map Reduce jobs Pig A series of MapReduce Jobs Turns the transformations into Pig Programs Execution Get Started with BIG Data & Hadoop
- 19. Slide 19 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Job Trends Hadoop Get Started with BIG Data & Hadoop
- 20. Slide 20 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Why SkillSpeed? Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Lifetime access to Course Content via LMS 100% Placement Assistance 24x7 Support Get Started with BIG Data & Hadoop
- 21. Slide 21 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Course Topics Module 1 Introduction to Big Data and Hadoop Module 2 HDFS Internals, Hadoop Configurations and Data Loading Module 3 Introduction to Map Reduce Module 4 Advanced Map Reduce Concepts Module 5 Introduction to Pig Module 6 Advanced Pig and Introduction to Hive Module 7 Advanced Hive Concepts Module 8 Extending Hive and HBase Introduction Module 9 Advanced HBase and Oozie Introduction Module 10 Project Set-up Discussion Get Started with BIG Data & Hadoop
- 22. Slide 22 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Corporate Partners Get Started with BIG Data & Hadoop
- 23. Slide 23 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Lines open 24/7 To know more about the course, Please contact: IND +91-90660-20904 USA 1866-607-6547 (Toll Free) Or reach us at [email protected] Contact Us Get Started with BIG Data & Hadoop
- 24. Slide 24 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Image References Google images credit for google, Facebook and LinkedIn LOGO and Snapshots http://pixshark.com/big-data-comic.htm http://findicons.com/icon/66444/user_group http://www.virtualizor.com/tour https://accounts.it.et.byu.edu/ http://www.clipartsfree.net/tag/server.html http://www.gopixpic.com/16/time-clock-icon-png-download http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/ http://www.lincs.fr/research/areas/big-data/ http://www.counsellingpages.co.uk/ http://langfordsconsultancy.com/langfords-training-support-package/ http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010