hadoop & distributed cloud computing

9

Click here to load reader

Upload: rajan-kumar

Post on 27-May-2015

938 views

Category:

Technology


2 download

DESCRIPTION

Hadoop and Distributed Cloud Computing

TRANSCRIPT

Page 1: Hadoop & distributed cloud computing

HADOOP & DISTRIBUTED CLOUD COMPUTINGDATA PROCESSING IN CLOUD

Presentation By : Rajan Kumar Upadhyay || [email protected]

Page 2: Hadoop & distributed cloud computing

CLOUD COMPUTING ?

Cloud computing is a virtual setup box that includes following - Delivery of computing as a service rather than product

- Shared resources are software, utility, hardware provided over a network ( Typically Internet )

Delivery of computing

Public Utilities

Shared Resources

Page 3: Hadoop & distributed cloud computing

DISTRIBUTED CLOUD COMPUTING

As the name explains : Distributed computing in cloudExamples:

• Distributed computing is nothing more than utilizing many networked computers to partition (split it into many smaller pieces) a question or problem and allow the network to solve the issue piecemeal

• Software like Hadoop. Written in Java, Hadoop is a scalable, efficient, distributed software platform designed to process enormous amounts of data. Hadoop can scale to thousands of computers across many clusters.

• Another instance of distributed computing, for storage instead of processing power, is bittorrent. A torrent is a file that is split into many pieces and stored on many computers around the internet. When a local machine wants to access that file, the small pieces are retrieved and rebuilt.

• P2P network, that send communication/data packages into multiple pieces across multiple network routes. Then assemble them in receivers end.

Distributed computing on cloud is nothing but next generation framework to utilize the maximum value of resources over distributed architecure

Page 4: Hadoop & distributed cloud computing

WHAT IS HADOOP

Flexible infrastructure for large scale computation and data processing on a network of commodity hardware.

Why Hadoop?A common infrastructure pattern extracted from building distributed systems

•Scale •Incremental growth •Cost •Flexibility • Distributed File System • Distributed Processing Framework

• Apache.org Open Source project • Yahoo !, Facebook, Google, Fox, Amazon, IBM, NY times uses it for their core infrastructure• Widely Adopted A valuable and reusable skill set

Taught at major universities Easier to hire for Easier to train on

Portable across projects, groups

Page 5: Hadoop & distributed cloud computing

HOW IT WORKS

HDFS: Hadoop Distributed File SystemA distributed file system for large data

• Your data in triplicate ( one local and two remote copies)

• Built-in redundancy, resiliency to large scale failures (automated restart and re-allocation )

• Intelligent distribution, striping across racks

• Accommodates very large data sizes On commodity hardware

Page 6: Hadoop & distributed cloud computing

PROGRAMMING MODEL

There are various programming model for Hadoop developments. I personally like & experienced with Map/Reduce

Why Map/Reduce:

•Simple programming technique: • Map(anything)->key, value • Sort, partition on key • Reduce(key,value)->key, value

• No parallel processing / message passing semantics

• Programmable in Java or any other language

Continued …

Page 7: Hadoop & distributed cloud computing

PROGRAMMING MODEL

Create/Allocate cluster

Put Data into File System

Program Execution

Move computation to DataGather output of map, sort or partition on key

Run reduce task

Results of job stored on HDFS

Your Map code is copied to the allocated nodes, preferring nodes that contain copies of your data

Data is split into blocks, stored in triplicate across your cluster

Page 8: Hadoop & distributed cloud computing

PRACTICES

Put large data source into HDFS

Perform aggregations, transformations, normalizations on the data

Load into RDBMS

Page 9: Hadoop & distributed cloud computing

THANK YOU

Thank you for reading this. I hope you find it useful. Please contact me to [email protected] if you have any queries/feedback. My Name is Rajan Kumar Upadhyay, I have more than 10 years of collective IT experience as a techie.

If you have anything to share/looking for consulting etc. Please feel free to contact me.