re:introduce big data and hadoop eco-system

Download re:Introduce Big Data and Hadoop Eco-system

If you can't read please download the document

Upload: shakir-ali

Post on 24-Jan-2017

96 views

Category:

Data & Analytics


2 download

TRANSCRIPT

re:Introduce Big Data and Hadoop Eco-system

re:Introduce Big Data and Hadoop Eco-system

Presented By:

Mohammed Shakir AliOct 21st 2015.

What is Big Data ?

Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. [Ref : www.sas.com]

Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. [Ref: www.wikipedia.com]

Everyday, we create 2.5 quintillion bytes of dataso much that 90% of the data in the world today has been created in the last two years alone. (10^18 bytes = 1000 petabytes).

2.5 Quintillion bytes = 2500 petabytes. [Ref: www.ibm.com/software/au/data/bigdata/]

Characteristics of Big Data.

Volume

Variety

Velocity

Veracity

Characteristics of Big Data.

Volume

Variety

Velocity

Veracity

Is Big Data really new ?

Lets check...Google search terms for Big Data vs (Data Analysis and BI).

Is Big Data really new ?

Lets check...Google search terms for Big Data vs (Data Analysis and BI).

https://www.google.com/trends/explore#q=Big%20Data%2C%20Data%20Analysis%2C%20Business%20Intelligence&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10

Big Data Management Challenges.

Big Data just keeps growing and growing,...according to Forrester Research:

The average organization will grow their data by 50 percent in the coming year.

Overall corporate data will grow by a staggering 94 percent.

Database systems will grow by 97 percent.

Server backups for disaster recovery and continuity will expand by 89 percent.

Big Data Management Challenges.

Use case of a Leading Medical Research Facility:

-Generates 100 terabytes of data from various instruments,

-Data is copied by 10 different research departments,

- Departments further process the data and add 5 terabytes of additional synthesized data each.

-Now they must manage a total of over a Petabyte of data, of which less than 150 terabytes is unique.

-Entire Petabyte of data is backed up, moved to a disaster recovery site, consuming additional power and space used to store it all.

Now the medical center has used over 10 petabytes of storage to manage less than 150 terabytes of real unique data.

Big Data Management Challenges.

Three basic challenges:

Storing,

Processing and

Managing it efficiently.

Reference:

http://www.forbes.com/sites/ciocentral/2012/07/05/best-practices-for-managing-big-data/

Possible Solutions:

Scale-out architectures to manage large Data sets

-Reduce the data to unique set of data.

Data Virtualization to incorporate centralized management of Data set.

-Reuse of same data footprint and to reduce data duplication.

Project Open Data

Several governments around the world are making data available to public.

Data is a valuable national resource and a strategic asset to the U.S. Government, its partners, and the public.

Managing this data as an asset and making it available, discoverable, and usable in a word, open not only strengthens our democracy and promotes efficiency and effectiveness in government, but also has the potential to create economic opportunity and improve citizens quality of life.

For example, when the U.S. Government released weather and GPS data to the public, it fueled an industry that today is valued at tens of billions of dollars per year. Reference: https://project-open-data.cio.gov/

Benefits Big Data.

Cost ReductionBig data technologies like Hadoop and cloud-based analytics can provide substantial cost advantages.

Faster, better decision makingAnalytics has always involved attempts to improve decision making, with high seed of Hadoop and in-memory analytics, several organizations have speed up decision process systems.

New products and services.Use of big data analytics is to create new products and services for customers.Several organizations have come up with new products/services with help of Big Data.

Reference : https://www.sas.com/fr_fr/news/sascom/2014q3/Big-data-davenport.html

Conclusion

Increased interest in Big Data and Hadoop eco-system is seen in recent years.

Recent trend in Data growth has created new challenges for Data management, along with new opportunities.

Several software products/solutions are available to manage Big Data effectively.

Hadoop architecture Eco-system

What is Apache Hadoop

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets.

- It runs on computer clusters built from commodity hardware.

- All the modules in Hadoop are designed to withstand hardware failures .

Apache Hadoop Framework.

Apache Hadoop framework is composed of the following modules:

1) Hadoop Distributed File System (HDFS) a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster;

2) Hadoop MapReduce a programming model for large scale data processing.

3) Hadoop YARN a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users' applications and

4) Hadoop Common contains libraries and utilities needed by other Hadoop modules;

Apache Hadoop Adaption

On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux cluster with more than 10,000 cores and produced data that was used in every Yahoo! web search query.

Apache Hadoop Adaption

On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux cluster with more than 10,000 cores and produced data that was used in every Yahoo! web search query.

In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage.

Apache Hadoop Adaption

On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux cluster with more than 10,000 cores and produced data that was used in every Yahoo! web search query.

In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage.

As of 2013, Hadoop adoption is widespread.

For example, more than half of the Fortune 50 use Hadoop

Search trends about Big Data.

HPC vs Hadoop search trends:

https://www.google.com/trends/explore#q=HPC%2C%20Hadoop&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10

Big Data and Hadoop Architecture

Apache Hadoop Architecture

Hadoop Cluster Setup

Apache Hadoop Projects

Apache Pig: is a high-level platform for creating MapReduce programs used with Hadoop.

Apache Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop

Apache Spark: Apache Spark is an open source cluster computing framework originally developed in the AMPLab at UC, Berkeley.

Apache Storm: Apache Storm is a distributed computation framework written predominantly in the Clojure programming language.

Apache Hbase: HBase is an open source, non-relational, distributed database modeled after Google's BigTable and written in Java.

Apache Zookeeper, Impala, Flume, Sqoop!

Search trends about Big Data.

Apache Hadoop vs Apache Spark search trends:

https://www.google.com/trends/explore#q=Hadoop%2C%20Apache%20Spark&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10

Prominent Hadoop Distrubutors

Cloudera

Hortonworks

MapR

Hadoop preview:

Cloudera Quickstart VM:

http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cloudera_quickstart_vm.html

Big Data work flow.http://insightdataengineering.com/blog/pipeline_map.html

Click to edit Master text stylesSecond levelThird levelFourth levelFifth level

Click to edit Master title style

Click to edit Master subtitle style

Click to edit Master title style

Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level

Click to edit Master title style

Click to edit Master text styles

Click to edit Master title style

Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level

Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level

Click to edit Master title style

Click to edit Master text styles

Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level

Click to edit Master text styles

Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level

Click to edit Master title style

Click to edit Master title style

Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level

Click to edit Master text styles

Click to edit Master title style

Click to edit Master text styles

Click to edit Master title style

Click to edit Master text stylesSecond levelThird levelFourth levelFifth level

Click to edit Master title style

Click to edit Master text stylesSecond levelThird levelFourth levelFifth level

Click to edit Master text stylesSecond levelThird levelFourth levelFifth level

Click to edit Master title style

Click to edit Master subtitle style

Click to edit Master title style

Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level

Click to edit Master title style

Click to edit Master text styles

Click to edit Master title style

Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level

Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level

Click to edit Master title style

Click to edit Master text styles

Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level

Click to edit Master text styles

Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level

Click to edit Master title style

Click to edit Master title style

Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level

Click to edit Master text styles

Click to edit Master title style

Click to edit Master text styles

Click to edit Master title style

Click to edit Master text stylesSecond levelThird levelFourth levelFifth level

Click to edit Master title style

Click to edit Master text stylesSecond levelThird levelFourth levelFifth level