co existence or competition ? - rdbms and hadoop

19
RDBMS and Hadoop - Co- existence or competition 05/26/2022 Copyright © 2011 Flytxt B.V. All rights reserved Ram Mohan

Upload: flytxt

Post on 22-May-2015

542 views

Category:

Technology


0 download

DESCRIPTION

RDBMS and Hadoop - Co-existance or Competition?

TRANSCRIPT

Page 1: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved

RDBMS and Hadoop - Co-existence or competition

Ram Mohan

Page 2: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 2

Introduction to RDBMS What is Hadoop and Map-Reduce Hadoop and RDBMS – A comparison Co-Existence – Practical Example - Master Website Q&A

Session Agenda!

Page 3: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 3

Relational DBMS Based on Relational Mathematics principles Data is represented in terms of rows and columns of a table Relational Terminology

◦ Tuple (Row)◦ Attribute (Column)◦ Relation (Table)

Integrity Constraints◦ Primary Key◦ Foreign Key◦ Alternate Key

ACID Test ◦ Atomicity◦ Consistency◦ Isolation◦ Durability

Page 4: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 4

Normalization Normalization - process of removing data redundancy by decomposing

relations in a Database. De normalization - carefully introduced redundancy to improve query

performance.

Page 5: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 5

Relational DBMS

Page 6: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 6

Example Data S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris P# PNAME COLOR WEIGHT CITY P1 Nut Red 12 London P2 Bolt Green 17 Paris P3 Screw Blue 17 Rome P4 Screw Red 14 London S# P# QTY S1 P1 300 S1 P2 200 S1 P3 400 S2 P1 300 S2 P2 400 S3 P2 200

Page 7: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 7

Five computers & a 640k ;-)

Moore’s Law

"I think there is a world market for about five computers"

"640k ought to be enough for anybody"

Thomas Watson 1943, Chairman of the board of IBM

Attributed to Bill Gates in 1981.

Page 8: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 8

The Big Data Challenges Sources of Data and the amount of data to analyze is growing

exponentially Stale data exists because DW solutions cannot ingest the vast amounts of

data fast enough Lack of performance for advanced analytics and complex queries The number of users and the concurrency of users is increasing rapidly

Page 9: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 9

Hadoop Architecture

Page 10: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 10

Reliably store petabytes of replicated data across thousand of nodes◦ Data divided in to 64 MB blocks, each block replicated three times

Master/Slave architecture◦ Master NameNode contains block locations◦ Slave Datanode manages blocks on local FS

Built on local commodity hardware◦ No RAID required

Hadoop – HDFS(Hadoop Distributed File System)

Page 11: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 11

Reliably store petabytes of replicated data across thousand of nodes◦ Data divided in to 64 MB blocks, each block replicated three times

Master/Slave architecture◦ Master NameNode contains block locations◦ Slave Datanode manages blocks on local FS

Built on local commodity hardware◦ No RAID required

Hadoop – HDFS(Hadoop Distributed File System)

Page 12: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 12

Map-Reduce Model

Page 13: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 13

Is not intended for realtime querying. Does not support random access. Significant learning curve Provides barebones functionality out of the box but scaling is built-in and

inexpensive

Hadoop – Limitations

Page 14: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 14

Joining◦ In a single query, get all products in an order with their product information

Secondary Indexing◦ Get CustomerId by e-mail

Referential Integrity Realtime Analysis. Millions are trained in SQL and relational data modelling RDBMS provides tremendous functionality, but is extremely difficult and

costly to scale

Where SQL Makes life easy

Page 15: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 15

Master Website – A Practical Example

Page 16: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 16

Profile Information – That is provided during sign up Intelligence generated ie the output of the analytic jobs. Any online purchasing track records and account management Reporting tools

Master Website – RDBMS Use Cases

Page 17: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 17

Generating Intelligence from the continuous stream of data◦ Wall Posts on Facebook

New tags to be added based on the old logs available, due to new requirements

Master Website – Hadoop Use Cases

Page 18: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 18

A Practical Example – Facebook Architecture

Page 19: Co existence or Competition ? - RDBMS and Hadoop

04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 19

THANK YOU