engineering big data with hadoop

26
ENGINEERING BIG DATA WITH HADOOP BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources and this presentation is intended only for knowledge sharing but not for any commercial business intention

Upload: international-school-of-engineering

Post on 11-Aug-2014

1.105 views

Category:

Data & Analytics


33 download

DESCRIPTION

This presentation explains about Introduction of BIG DATA with HADOOP.

TRANSCRIPT

Page 1: Engineering Big Data with Hadoop

ENGINEERING BIG DATA WITH

HADOOP

BYInternational School of

Engineering {We Are Applied Engineering}

Disclaimer: Some of the Images and content have been taken from multiple online sources and this presentation is intended only for knowledge sharing but not for any commercial business intention

Page 2: Engineering Big Data with Hadoop

OVERVIEW

• WHAT IS BIG DATA?

• EXPLOSION OF DATA

• DATA CONTRIBUTIONS

• DATA EXPLOSION

• WHO ARE THE PLAYERS?

• BIG DATA–BIG PICTURE– LANDSCAPE

• BIG DATA– ENTERPRISE ROLES

• WHAT IS HADOOP?

• EVOLUTION OF HADOOP

• HADOOP ECOSYSTEM

• HADOOP ECOSYSTEM MAP

• HADOOP: 30,000 FEET VIEW

• BIG DATA & ANALYTICS Case studies

• VIDEO OF HADOOP ECOSYSYTEM

Page 3: Engineering Big Data with Hadoop

WHAT IS BIG DATA?

• High-volume, high-velocity and high- variety information assets that demand cost- effective,

innovative forms of information processing for enhanced insight and decision making.

-Gartner

HIGH VOLUME

HIGH VELOCITY

HIGH VARIETY

Page 4: Engineering Big Data with Hadoop

EXPLOSION OF DATA

Page 5: Engineering Big Data with Hadoop

Source: http://www.emc.com/leadership/digital-universe/iview/index.htm

Page 6: Engineering Big Data with Hadoop

DATA CONTRIBUTIONS

Page 7: Engineering Big Data with Hadoop

DATA EXPLOSION

Bing ingests > 7 petabyte a month

The Twitter community generates over 1 terabyte of tweets every day

Cisco predicts that by 2013 annual internet traffic flowing will reach 667

exabytes

Page 8: Engineering Big Data with Hadoop

Source: http://www.emc.com/collateral/about/news/idc-emc-digital-universe-2011-infographic.pdf

Page 9: Engineering Big Data with Hadoop

Source: http://www.emc.com/collateral/about/news/idc-emc-digital-universe-2011-infographic.pdf

Page 10: Engineering Big Data with Hadoop

WHO ARE THE PLAYERS?

Page 11: Engineering Big Data with Hadoop
Page 12: Engineering Big Data with Hadoop

BIG DATA–BIG PICTURE– LANDSCAPE

Page 13: Engineering Big Data with Hadoop

BIG DATA– ENTERPRISE ROLES

Page 14: Engineering Big Data with Hadoop

INTRODUCTION TO

Page 15: Engineering Big Data with Hadoop

WHAT IS HADOOP?

• Flexible

Structured/Unstructured

Text/Binary

Schema/Schema less

• 100% Open Source

• Scalable

– Petabytes of Data

– Thousands of Nodes

Source: http://cloudtimes.org/2013/06/25/hadoop-as-a-service-market-growing/

Page 16: Engineering Big Data with Hadoop

How does an Elephant Sneak up on you?

EVOLUTION OF HADOOP

Page 17: Engineering Big Data with Hadoop

HADOOP ECOSYSTEM

Chukwa Sqoop Zookeeper Pig

HBase Avno Mahout Flume

WhirrMap Reduce Engine

Hama

Hive

Hadoop Distributed File System

Hadoop Common

Page 18: Engineering Big Data with Hadoop

Source: http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/

HADOOP ECOSYSTEM MAP

Page 19: Engineering Big Data with Hadoop

Hadoop Evolution – Map Explained!

• How did it all start- huge data on the web!

• Nutch built to crawl this web data

• Huge data had to be saved- HDFS was born!

• How to use this data? Map reduce framework built for coding and running analytics – java,

any language-streaming (Hadoop streaming)

• How to get in unstructured data – Web logs, Click streams, Apache logs, Server logs –

fuse,webdav, chukwa, flume, Scribe

• Hiho and sqoop for loading data into HDFS – RDBMS can join the Hadoop band wagon!

Page 20: Engineering Big Data with Hadoop

Continued

• High level interfaces required over low level map reduce programming– Pig, Hive, Jaql

• BI tools with advanced UI reporting- drilldown etc- Intellicus

• Workflow tools over Map-Reduce processes and High level languages: Oozie

• Monitor and manage hadoop, run jobs/hive, view HDFS – high level view- Hue, karmasphere,

eclipse plugin, cacti, ganglia

• Support frameworks- Avro (Serialization), Zookeeper (Coordination)

• More High level interfaces/uses- Mahout, Elastic map Reduce

• OLTP- also possible – Hbase

Page 21: Engineering Big Data with Hadoop

• Distribute data initially

– Let processors / nodes work on local data

– Minimize data transfer over network

– Replicate data multiple times for increased availability

• Write applications at a high level

– Programmers should not have to worry about network programming, temporal

dependencies, low level infrastructure, etc

• Minimize talking between nodes (share-nothing)

HADOOP: 30,000 FEET VIEW

Page 22: Engineering Big Data with Hadoop

BIG DATA & ANALYTICS

Case Studies

Page 23: Engineering Big Data with Hadoop

YAHOO - PERSONALIZATION

Page 24: Engineering Big Data with Hadoop

YAHOO SEARCH ASSIST

Page 25: Engineering Big Data with Hadoop

For Detailed Description of HADOOP ECOSYSTEM

components

checkout our video on

Page 26: Engineering Big Data with Hadoop

Plot no 63/A, 1st Floor, Road No 13, Film Nagar, Jubilee Hills, Hyderabad-500033

For Individuals (+91) 9502334561/62For Corporates (+91) 9618 483 483

Facebook: www.facebook.com/insofe

Slide share: www.slideshare.net/INSOFE

International School of Engineering