become an expert in hadoop · hdfs, mapreduce, spark, hbase, hive, pig, oozie, sqoop & flume....

7
BECOME AN EXPERT IN HADOOP About the Course Hadoop is an Apache project (an open source software) to store and process Big Data. Hadoop stores Big Data in a distributed and fault tolerant manner over commodity hardware. Afterwards, Hadoop tools are used to perform parallel data processing over HDFS (Hadoop Distributed File System). As organizations have realized the benefits of Big Data Analytics, so there is a huge demand for Big Data & Hadoop professionals. Companies are looking for Big data & Hadoop experts with the knowledge of Hadoop Ecosystem and best practices about HDFS, MapReduce, Spark, HBase, Hive, Pig, Oozie, Sqoop & Flume. This course will introduce students to this rapidly growing field and equip them with some of its basic principles and tools as well as its general mindset. - Students will learn concepts, techniques and tools they need to deal with various facets of Hadoop practice. - The focus in the treatment of these topics will be a balanced approach on breadth and depth, and emphasis will be placed on integration and synthesis of concepts and their application to real time problems. - To make the learning contextual, real datasets from a variety of disciplines will be used. Program Highlights Most Comprehensive Curriculum Trained by passionate and Industry experts Each concept will be explained by golden rule Theory Example Software Implementation Real-Time applicability All classes explained with REAL TIME projects experience End to End explanation with architecture Designed for the Industry Live Project Placement Assistance Free Mock Interviews for Hadoop Interview preparation Hand written notes copy and slides copy Detailed assistance in Resume preparation. Special attention for experienced people on previous experience Latest resources, blogs and articles sharing

Upload: others

Post on 24-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BECOME AN EXPERT IN HADOOP · HDFS, MapReduce, Spark, HBase, Hive, Pig, Oozie, Sqoop & Flume. This course will introduce students to this rapidly growing field and equip them with

BECOME AN EXPERT IN

HADOOP About the Course Hadoop is an Apache project (an open source software) to store and process Big Data. Hadoop stores Big Data in a distributed and fault tolerant manner over commodity hardware. Afterwards, Hadoop tools are used to perform parallel data processing over HDFS (Hadoop Distributed File System). As organizations have realized the benefits of Big Data Analytics, so there is a huge demand for Big Data & Hadoop professionals. Companies are looking for Big data & Hadoop experts with the knowledge of Hadoop Ecosystem and best practices about HDFS, MapReduce, Spark, HBase, Hive, Pig, Oozie, Sqoop & Flume. This course will introduce students to this rapidly growing field and equip them with some of its basic principles and tools as well as its general mindset.

- Students will learn concepts, techniques and tools they need to deal with various facets of Hadoop practice.

- The focus in the treatment of these topics will be a balanced approach on breadth and depth, and emphasis will be placed on integration and synthesis of concepts and their application to real time problems.

- To make the learning contextual, real datasets from a variety of disciplines will be used.

Program Highlights Most Comprehensive Curriculum Trained by passionate and Industry experts Each concept will be explained by golden rule

Theory Example Software Implementation Real-Time applicability All classes explained with REAL TIME projects experience End to End explanation with architecture Designed for the Industry Live Project Placement Assistance Free Mock Interviews for Hadoop Interview preparation Hand written notes copy and slides copy Detailed assistance in Resume preparation. Special attention for experienced

people on previous experience Latest resources, blogs and articles sharing

Page 2: BECOME AN EXPERT IN HADOOP · HDFS, MapReduce, Spark, HBase, Hive, Pig, Oozie, Sqoop & Flume. This course will introduce students to this rapidly growing field and equip them with

Audience Basic knowledge on below [FREE classes will be provided if needed]: SQL Commands Linux Commands Java Basics - OOPs Concepts only

Duration & Mode of Training 2 months, Online Training

Course Content

Understanding Big Data and Hadoop

Objectives: In this module, you will understand what Big Data, Hadoop is and its usage, how Hadoop solves those Big Data problems, Hadoop Ecosystem, Hadoop Architecture, HDFS and how MapReduce works.

Introduction to Big data What is Big data? Big Data opportunities, Challenges Characteristics of Big data

Limitations & Solutions of Big Data Architecture Introduction to Hadoop and use of Hadoop?

Hadoop & its Features Hadoop Ecosystem Hadoop 2.x Core Components Components of Hadoop Ecosystem

i. Storage: HDFS (Hadoop Distributed File System) ii. Processing: MapReduce Framework

Different Hadoop Distributions

Hadoop Architecture and HDFS

Objectives: In this module, you will learn Cluster environment, Hadoop Cluster Architecture, important configuration files of Hadoop Cluster, installing single node cluster, in-depth Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability and in-depth YARN – resource manager and node manager.

What is Cluster Environment? Hadoop 2.x Cluster Architecture

Page 3: BECOME AN EXPERT IN HADOOP · HDFS, MapReduce, Spark, HBase, Hive, Pig, Oozie, Sqoop & Flume. This course will introduce students to this rapidly growing field and equip them with

Hadoop Cluster Modes Common Hadoop Shell Commands Hadoop 2.x Configuration Files Single Node and Multi-Node Cluster set up Hadoop Administration Significance of HDFS in Hadoop Features of HDFS Storage aspects of HDPS Replication in Hadoop

Hadoop MapReduce Framework

Objectives: In this module, you will understand Hadoop MapReduce framework comprehensively, the working of MapReduce on data stored in HDFS. You will also learn the advanced MapReduce concepts like Input Format, Output Format, Partitioners, Combiners, Shuffle and Sort.

Why Map Reduce is needed in Hadoop? Traditional way vs MapReduce way Why MapReduce? YARN Concepts

YARN Architecture YARN MapReduce Application Execution Flow YARN Workflow

Anatomy of MapReduce Program Input Splits

InputSplit Need Of Input Split in Map Reduce InputSplit Size InputSplit Size Vs Block Size InputSplit Vs Mappers

Relation between Input Splits and HDFS Blocks

Advanced Hadoop MapReduce

Objectives: In this module, you will learn Advanced concepts of MapReduce such as Counters, MapReduce programming model, Distributed Cache, MRunit, MR Joins and XML parsing.

Counters MapReduce Programming Model

Page 4: BECOME AN EXPERT IN HADOOP · HDFS, MapReduce, Spark, HBase, Hive, Pig, Oozie, Sqoop & Flume. This course will introduce students to this rapidly growing field and equip them with

Write basic MapReduce program i. Driver Code ii. Mapper Code iii. Reducer Code

Identity Mapper and reducer Input Formats in Map Reduce Output Formats in Map Reduce Combiner and Partitioner Joins in Map Reduce

i. Map side Join ii. Reduce side Join iii. Real time applicability iv. Distributed Cache

XML file Parsing using MapReduce MRunit Custom Input Format XML file Parsing using MapReduce Hands-on Exercises

Apache Pig

Objectives: In this module, you will learn Apache Pig, types of use cases where we can use Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting, Pig running modes, Pig UDF, Pig Streaming & Testing Pig Scripts. You will also be working on healthcare dataset.

Introduction to Apache Pig MapReduce vs Pig Pig Components & Pig Execution Different Data Types in Pig Modes of Execution in Pig

Local Mode Map Reduce or Distributed Mode

Transformations in Pig Pig Latin Programs Shell and Utility Commands Pig UDF & Pig Streaming Develop a simple Pig Script Use of commands Group By, Filter By, Distinct, Cross, Split with a Use case

Page 5: BECOME AN EXPERT IN HADOOP · HDFS, MapReduce, Spark, HBase, Hive, Pig, Oozie, Sqoop & Flume. This course will introduce students to this rapidly growing field and equip them with

Apache Hive

Objectives: This module will help you in understanding Hive concepts, Hive Data types, loading and querying data in Hive, running hive scripts and Hive UDF.

Introduction to Apache Hive Hive vs Pig Hive Architecture and Components

Driver Compiler Executor

Hive Metastore Importance Of Hive Meta Store Communication mechanism with Metastore and configuration details

Limitations of Hive Comparison with Traditional Database Hive Data Types and Data Models

Array Struct Map

Conditional Functions in Hive Importance of CASE statement with a Use case

Hive Partition Hive Bucketing Hive Tables (Managed Tables and External Tables) Importing Data Querying Data & Managing Outputs Hive Script User Defined Functions(UDFs) in HIVE

UDFs UDAFs UDTFs Need of UDFs in HIVE

Use cases

Advanced Apache Hive

Page 6: BECOME AN EXPERT IN HADOOP · HDFS, MapReduce, Spark, HBase, Hive, Pig, Oozie, Sqoop & Flume. This course will introduce students to this rapidly growing field and equip them with

Objectives: In this module, you will understand advanced Apache Hive concepts such as UDF, Dynamic Partitioning, Hive indexes and views, and optimizations in Hive.

Hive Serializer/Deserializer - SerDe Hive QL: Joining Tables, Dynamic Partitioning Custom MapReduce Scripts Hive Indexes and views Hive Query Optimizers Hive Thrift Server Hive UDF

Sqoop

Objectives: This module will help you in understanding in-depth knowledge of Sqoop and loading the data using Sqoop from a Database.

Introduction to Sqoop. How to connect to Relational Database using Sqoop Performance Implications in SQOOP Import and Export and how to improve the

performance Different Sqoop Commands Different flavors of Imports

Historical Incremental

Export Sqoop imports to Hive tables Sqoop imports to HBase Sqoop Incremental Load VS History Load & Limitations in Incremental Load

Apache HBase

Objectives: This module will help you in understanding in-depth knowledge of Apache HBase, HBase Architecture, HBase running modes and its components.

Apache HBase: Introduction to NoSQL Databases and HBase HBase v/s RDBMS HBase Concepts

HBase Architecture HBase Run Modes HBase Configuration HBase Cluster Deployment

Page 7: BECOME AN EXPERT IN HADOOP · HDFS, MapReduce, Spark, HBase, Hive, Pig, Oozie, Sqoop & Flume. This course will introduce students to this rapidly growing field and equip them with

Advanced Apache HBase

Objectives: This module will cover advance Apache HBase concepts. We will see demos on HBase Bulk Loading & HBase Filters. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster & why HBase uses Zookeeper.

HBase Data Model Column families Column Qualifier Name Row Key

HBase Concepts Architecture HBase shell HBase Client API

Hive Data Loading Techniques Apache Zookeeper Introduction

ZooKeeper Data Model Zookeeper Service

HBase0020Bulk Loading Getting and Inserting Data HBase Filters HIVE – HBASE Integration

Oozie

Objectives: In this module, you will understand how multiple Hadoop ecosystem components work together to solve Big Data problems. This module will also cover Apache Oozie Workflow Scheduler for Hadoop Jobs.

Oozie Introduction Oozie Architecture Oozie Workflow Oozie Job Submission

Workflow.xml Coordinator.xml Job.coordinator.properties

Scheduling a Oozie workflow Use Case