emulex and the evaluator group present why i/o is strategic for big data

21
1 Presented by: Emulex and Evaluator Group Why I/O Is Strategic for Big Data

Upload: emulex-corporation

Post on 17-Jan-2015

448 views

Category:

Documents


0 download

DESCRIPTION

This webcast is the fourth in a series on why I/O is strategic for the data center. John Webster, senior partner at the Evaluator Group, will discuss why I/O is critically important to meet the bandwidth demands of big data deployments. As the data center infrastructure scales upward, so will the need for I/O to scale dynamically to meet these needs.

TRANSCRIPT

Page 1: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

1

Presented by: Emulex and Evaluator Group

Why I/O Is Strategic for Big Data

Page 2: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

2

Webcast Housekeeping

1. All attendees will be on mute during the presentation

2. Please submit your questions via the text/chat feature

3. We will do all Q&A at the end of the presentation

Page 3: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

3

Katherine LaneDirector of Corporate Communications

Why I/O Is Strategic

Page 4: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

4

Why I/O Is Strategic?

Building a Virtual Panel of Experts!

Page 5: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

5

Topics for the Virtual Panel

ServerVirtualization

CloudComputing

NetworkConvergence

BigData

Page 6: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

Moving the Elephant Through the Pipes

John WebsterSenior Partner

Evaluator Group

Page 7: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

Overview “Big data” can mean two different things

— Storage for large amounts of data

— Analytics against very large amounts of data

— I/O is critical for both

Big Data Apps — Personalized Healthcare

— Online-style shopping for bricks-and-mortar retailers

— Fraud detection

Marketing Needs it Now— Correlate customer data with social media data feeds

— Understand the buyer as an individual

04/10/2023 7

Page 8: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

04/10/2023

Logs, Tweets

Location

HDFS

NoSQL DB

Customer Profiles

High Scale Data Reductions BI and

Analytics

Expert System

NoSQL DB

Low Latency

1) Identify User

2a)Lookup User Profile

2b) Lookup Location

Predictions on Buying Behavior

4) Real-time: Determine Best Offer For This

Customer3) Input Into

Data Analytics Model for Individualized

Marketing

8

Page 9: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

04/10/2023

NODE 1

NODE 2

NODE 3

NODE n

DAS DAS DAS DAS

1 2 3 4 5 6 7 8

B8

GM

R3 Link

Active

Link

Active

Link

Active

ConsolePwr

Active

Link

Active

DAS

Network Layer

Compute Layer

Storage Layer

Distributed, Shared-Nothing Architectures for Big Data

Analytics

9

CONTROL

Page 10: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

CAP theorem

It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: Consistency (all nodes see the same data at the same

time) Availability (a guarantee that every request receives a

response about whether it was successful or failed) Partition tolerance (the system continues to operate

despite arbitrary message loss or failure of part of the system)

A distributed system can satisfy any two of these guarantees at the same time, but not all three

Page 11: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

04/10/2023

The Impact of Network and I/O Performance

The impacts of internal analytics system network performance—both positive and negative—are experienced at the level of analytics application users.

The rate at which data flows between storage and processors within a Hadoop cluster has a direct effect on cluster performance and scalability.

Getting data into and out of distributed computing clusters impacts how quickly query results are delivered to users.

11

Page 12: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

Internal Network Throughput 1GbE

Page 13: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

Internal Network Throughput 10GbE

Page 14: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

Load/Unload Throughput

Page 15: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

04/10/2023

Why Enterprise IT is Now Involved

Distributed computing for analytics (Hadoop, for example) is moving from science experiment to mission-critical

Emerging Enterprise Hadoop use cases include:— Hadoop for very large data sets that can’t be analyzed economically

by the data warehouse— Hadoop on the front-end of the data warehouse — Hadoop as data convergence engine – combine new unstructured

data sources with structured data warehouse data— Hadoop as the back-end to the data warehouse

Also emerging in the need to bring Hadoop under the data governance umbrella— Use case for NAS/SAN attached to Hadoop clusters?— At what cost?

15

Page 16: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

04/10/2023

Is Hadoop Ready for Prime Time? Hadoop was not born and raised in the highly

risk averse, enterprise data center

Hadoop puts forward a different and inefficient operational model from the standpoint of enterprise IT

Hadoop introduces enterprise security and data governance issues

16

Page 17: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

NODE 1

NODE 2

NODE 3

NODE n

1 2 3 4 5 6 7 8

B8

GM

R3 Link

Active

Link

Active

Link

Active

ConsolePwr

Active

Link

Active

CONTROL

Shared Storage as Secondary Storage

Network Layer

Compute Layer

Storage Layer

SAN/NAS

Page 18: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

NODE 1

NODE 2

NODE 3

NODE n

1 2 3 4 5 6 7 8

B8

GM

R3 Link

Active

Link

Active

Link

Active

ConsolePwr

Active

Link

Active

CONTROL

Shared Storage as Primary Storage

Network Layer

Compute Layer

Storage Layer

SAN and Scale-out NAS

Page 19: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

Evaluating Hadoop as a Storage Device

Single Points of Failure Eliminated? SSD and automated tiering? Dedupe? Snapshots? Insert your hot-button storage feature here:

__________

Page 20: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

© 2012 Evaluator Group, Inc.

04/10/2023

Enterprise IT and Big Data

Analytics

There will be Big Data—Storage and Apps

The traditional data warehouse will continue to evolve

Distributed computing clusters (XxSQL, Hadoop) will achieve prominence in enterprise data centers

Shared storage, while controversial within some circles, can be applied

Communications bandwidth is as important a resource as compute and storage

20

Page 21: Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

21© 2011 Emulex Corporation