age of big data - hfu furtwangenheindl/ebte-2014ws...• dean & sanjay (2004)> mapreduce:...

26
Age of Big data Presented by: Mohammad Iqbal BCM -2014

Upload: others

Post on 27-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Age of Big data

Presented by:

Mohammad Iqbal BCM -2014

Page 2: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Agenda

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolution from Google

The future is here!

Page 3: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

What is a Big Data ? Name Symbol Value

Kilobyte KB 10^3

Megabyte MB 10^6

Gigabyte GB 10^9

Terabyte TB 10^12

Petabyte PB 10^15

Exabyte EB 10^18

Zettabyte ZB 10^21

Yottabyte YB 10^24

BIG DATA

So large data that it becomes difficult to process it using the traditional system

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 4: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Difficult to process by Traditional System

100 MB document

100 TB document

100 GB document

Unable to send

Unable to Edit

Unable to View

Depends on capability of

system

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 5: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Organization/Context Specific

500 TB Text,Audio,Video

data per day

Company A

Company B

Big Date

NOT a Big data

Depends on capabilities

of the organization

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 6: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Areas of Challenges

Capture

Curation

Storage

Anlaysis Visualization

Transfer

Sharing

search

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 7: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Big Data Attributes

Big

Data • Large & growing files • At High speed • In various Format

VELOCITY VOLUME VARIETY

Data comes at

high speed

This files comes in various formats

Data result in large file V^3

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 8: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Structured / Unstructured

Unstructured Data 90%

Structured

Data 10%

Challenge /Opportunity

Mostly wasted

Used in decision making

To analyze & extract meaningful information

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 9: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Big data Sources

Users

Applications

Systems

Sensors

Large & growing files

(Big data files)

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 10: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Data Generation point Examples

Mobile devices

Microphones

Readers/Scanners

Software/program

Social Media

cameras

Machine Sensors

Science facilities

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 11: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Sample Events generating Data

• Every day, we create 2.5 Exabytes of data i.e 2.5 billion GB, so much that 90% of the data in the world today has been created in the last few years alone.

• CERN Atomic facility generates 40 TB data per second.

• Twitter generates 12 TB of data every day.

• Airbus A380 generates 10 TB every 30 minutes of flight. About 650TB generated in one flight.

• In 2009 total data in world was estimated to be 1 ZB. By 2020 estimated to be 35 ZB .

(Source :IBM.com)

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 12: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Getting Value from Big data

Collect Understand Analyze

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 13: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Big data Applications

• Companies gaining edge by collecting ,analyzing and understanding information.

• Government forecasting events and taking proactive actions.

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 14: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

New Tools for Big Data

Traditional Systems

(e.g RDBMS ,SQL)

Big data tool (e.g Hadoop

NoSQL)

Time

Not able to handle Big

data

Created to handle big data

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 15: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Traditional Enterprise Approach

Big data Powerful Computer Processing Limit

Only So much data could be

processed

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 16: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Modern Hadoop’s approach

Big data

Combined result Computation

Computation

Computation

Computation

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 17: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Hadoops’s Architecture

Source :hortonworks/hadoop/hdfs/.com/

Map Reduce

File System HDFS

Projects

HBase

Mahout

Pig

Oozie

Flume

Scoop

Hive

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 18: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Application Task tracker

Data Node

Task tracker

Data Node

Task tracker

Data Node

Task tracker

Job Tracker

Data Node

Data Node

Task tracker

Name Node

MASTER

Slaves

DATA

Page 19: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Application Task tracker

Data Node

Task tracker

Data Node

Task tracker

Data Node

Task tracker

Job Tracker

Data Node

Data Node

Task tracker

Name Node

MASTER

Slaves

DATA

Kn

ow

w

here

data

residin

g

Data can be taken directly

Page 20: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

HDFS vs GFS

• Similarity with Google file system (GFS)MapReduce

• Back in 1990 search engine supported by:

Excite

Altavista

Lycos

Infoseek

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 21: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Google Victory

1995

2000

Excite

Altavista

Lycos

Google

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 22: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Hadoop evolution from Google

2003 2006 2005 2004

GFS paper released by Google

Google released paper on MapReduce

Hadoop created by Doug & Cafarella at Yahoo! (Nutch search engine)

Yahoo donated the project to Apache

Source :Google & Nutch white papers

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 23: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

The future is here !!

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 24: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

• Big data scientists with just two years' experience can earn between $200,000 and $300,000 a year (wall street journel).

• Anyone with "data science" in his or her job title on a LinkedIn page is going to get "100 recruiter emails a day,“.(wall street journel).

• Hadoop is a super hot up-and-coming "big data" technology. (Business insider.com).

• Many other data scientists, especially at data-driven companies such as Google, Amazon, Microsoft, Walmart, eBay, LinkedIn, and Twitter, have added to and looking for developing the Hadoop tool kit. (Harvard business review).

• "People are slapping buzzwords as “Hadoop”on résumés and looking to get 50 or 100 percent more, and they're getting it," said Scott Gnau, president of Teradata Data Lab.

What is a Big Data ?

Big Data Attributes

Big data Sources

Getting Value from

Big data

New Tools for Big Data

Hadoops' Architecture

Hadoop evolve frm

Google

The future is here!

Page 25: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

References • Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large

Clusters.google.com

• Dogh Cutting Nutch(2005): A Flexible and Scalable Open-Source Web Search Engine.yahoo .com

• Sanjay & Howard (2003): The Google File System, google.com

• https://www.ibm.com/developerworks/vn/library/contest/dw-freebooks/Tim_Hieu_Big_Data/Understanding_BigData.PDF [Accessed date 27th nov 2014]

• http://www.businessinsider.com/10-tech-skills-that-will-instantly-net-you-100000-salary-2012-8?op=1[Accessed date 27th nov 2014]

• Big Data's High-Priests of Algorithms,http://online.wsj.com/articles/academic-researchers-find-lucrative-work-as-big-data-scientists-1407543088[Accessed date 27th nov 2014]

Page 26: Age of Big data - HFU Furtwangenheindl/ebte-2014ws...• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large Clusters.google.com • Dogh Cutting Nutch(2005): A Flexible

Thank you for your attention

Q/A