managing big data - pass€¦ · managing big data with the microsoft parallel data warehouse...

48
Managing Big Data With the Microsoft Parallel Data Warehouse Appliance

Upload: others

Post on 31-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Managing Big DataWith the Microsoft Parallel Data Warehouse Appliance

Page 2: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Chris CampbellAnalytics Platform Practice Lead

About Me

Page 3: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Business Insights. Delivered.BlueGranite provides end-to-end business analytics solutions.

Enable the organization to store and analyze large volumes of structured and non-structured data with optimized systems that can scale to meet demand.

Help your team understand past performance and prescribe actions through interactive dashboards, reports and predictive analysis.

Keep data in the hands of your decision makers wherever they are with interactive solutions on today’s mobile devices.

Page 4: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

What’s all this about Big Data?

Page 5: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

“Every day, the amount of data eBay processes adds up to an astonishing 50 petabytes”

“Walmart handles more than 1 million customer transactions every hour”

“Facebook handles 50 billion photos from its user base”

“The volume of business data worldwide, across all companies, doubles every 1.2 years”

“the data flow from all four LHC experiments represents 25 petabytes annual rate”

“as of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created”

The Data Explosion

– Wikipedia

Page 6: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Big Data Defined

• Line of Business MegabytesTransactional / Relational

User / Customer Generated

• Spreadsheets• Documents• Text Files

Gigabytes

External / Public

• Demographics• Weather• Government• Marketing

Terabytes

Streaming / Social / Machine Generated

• Server logs• Clickstreams• Sensor Output• Manufacturing• Images and Video• Medical Equipment• Test Results• Social, Social, Social!

Petabytes

Page 7: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

The three “V”sThe three four “V”s

Volume

Variety

Value

Velocity

Walmart handles over

1 million customer

transactions every

hour

eBay processes up to

50PB a day

LHC experiments

generate 25PB annually

Page 8: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

”Most firms estimate they are only analyzing 12% of

the data they already have”

… “In addition, it’s often impossible to judge what

data is valuable and what isn’t”

– Forrester ResearchThe Forrester Wave™: Big Data Hadoop Solutions, Q1 2014

Why Big Data Matters

Page 9: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Warehouse Layer

Traditional Data Warehouse Architecture

Source System Layer

Data Integration Layer

Analytics Layer

Page 10: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights
Page 11: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Which products

sell better when it rains?

Page 12: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

What demographicmakes up our product’s primary

customer base?

Page 13: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Can I prevent failure modes?

Page 14: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Are my employees engaging in fraud?

Page 15: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

WHAT WILL

A PATIENT’SOUTCOME LIKELY BE?

Page 16: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

What is being said about our

customer service?

Page 17: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Which products

sell better when it rains?WHAT WILL

A PATIENT’SOUTCOME LIKELY BE?

Can I prevent failure modes?What demographic

makes up our product’s primary customer base?

Are my employees engaging in fraud?

What is being said about our

customer service?

Page 18: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Unstructured Data – The New Problem

Soci

al • Twitter

• Facebook

• Instagram

• Vine

• Pinterest

• Blogs

• Comments

• Likes

• Surveys

Stre

amin

g • Server Logs

• Manufacturing Equipment

• Alerts

• Sensor Data

• Medical Instruments

• Test Results

• Diagnostics

• Search

Sem

i-St

ruct

ure

d • Spreadsheets

• Documents

• Drawings

• Text

• XML

• Images and Video

• Gene Sequences

• Drug Interactions

Page 19: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

A Modern Approach

Page 20: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

The Data Lake

Page 21: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Parallel Data Warehouse

Page 22: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Scale Up or Scale Out?

Page 23: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Two types of architectures

SMP – Scale Up MPP – Scale Out

Scalability Decreases as Cost Increases Capacity and Performance Scale Linearly with Cost

Page 24: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Scale Up (SMP) vs. Scale Out (MPP)

1 x HP DL360 = $17,430.00 MSRP

16 Cores (2 x Intel Xeon E5-2690 @ 2.9 GHz, 20 MB)256 GB Memory (16 x 16GB PC3-12800R)

Page 25: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Scale Up (SMP) vs. Scale Out (MPP)

1 x HP DL560 = $36,487.00 MSRP 2 x HP DL360 = $34,860.00 MSRP

32 Cores (4 x Intel Xeon E5-4650 @ 2.7 GHz, 20 MB)256 GB Memory (16 x 16GB PC3-12800R)

32 Cores (2 x 2 x Intel Xeon E5-2690 @ 2.9 GHz, 20 MB)512 GB Memory (2 x 16 x 16GB PC3-12800R)

Page 26: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Scale Up (SMP) vs. Scale Out (MPP)

1 x HP DL980 = $121,353.00 MSRP 4 x HP DL360 = $69,720.00 MSRP

64 Cores (8 x Intel Xeon E7-2380 @ 2.13 GHz, 24 MB)1 TB Memory (64 x 16GB PC3-10600R LV)

64 Cores (4 x 2 x Intel Xeon E5-2690 @ 2.9 GHz, 20 MB)1 TB Memory (4 x 16 x 16GB PC3-12800R)

Page 27: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

16 Cores 32 Cores 64 Cores

Scale Up (SMP) vs. Scale Out (MPP)

Scale Out Scale Up

Page 28: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

CO

ST

PERFORMANCE

SMP vs. MPP ROI

SMP

MPP

Page 29: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Appliance Architecture

Page 30: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

HP PDW Architecture

• Designed to be fault tolerant from the ground up

Quarter Rack• 2 Active Compute Servers• 32 Cores• 512 GB Memory• 15 TB Uncompressed Storage

7 Full Racks• 56 Active Compute Servers• 896 Cores• 14.3 TB Memory• 1.2 PB Uncompressed Storage

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

Control Server

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Each Node• 16 Cores• 256 GB Memory• 7.5 TB Uncompressed Storage

Full Rack• 8 Active Compute Servers• 128 Cores• 2 TB Memory• 60 TB Uncompressed Storage

Page 31: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Dell/Quanta PDW Architecture

• Designed to be fault tolerant from the ground up

Third Rack• 3 Active Compute Servers• 48 Cores• 768 GB Memory• 15 TB Uncompressed Storage

6 Full Racks• 54 Active Compute Servers• 864 Cores• 13.8 TB Memory• 1.2 PB Uncompressed Storage

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

Control Node

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

Control Node

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

Control Node

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

Control Node

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

Control Node

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

Control Node

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Each Node• 16 Cores• 256 GB Memory• 7.5 TB Uncompressed Storage

Full Rack• 9 Active Compute Servers• 144 Cores• 2.3 TB Memory• 67.5 TB Uncompressed Storage

Page 32: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

HP App System for PDW Dell Parallel Data Warehouse Appliance

Page 33: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Virtualized Architecture Overview

• General Details

• Hosts and guests run Windows Server 2012 Standard

• Fabric and workload contained in Hyper-V virtual machines

• PDW Agent runs on all hosts and all VMs

• Windows Storage Spaces handles mirroring and spares

• PDW Workload Details

• SQL Server 2012 Enterprise Edition (PDW build) Host 2

Host 1

Host 3

Host 4

JBOD

IB &Ethernet Direct attached SAS

CTL MAD AD VMM

Compute 2

Compute 1

Page 34: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Failover Functionality

• Cluster Shared Volumes:

• CSV allows all nodes to access the LUNs on the JBOD as long as at least one of the hosts attached to the JBOD is active

• Leverages SMB3 protocol

• Failover Details:

• One cluster across the whole appliance

• VMs are automatically migrated on host failure

• Affinity and anti-affinity maps enforce rules

• Failback continues to be through CSS

• Leverages Windows Failover Cluster Manager

• Adding Passive Unit increases HA capacity:

• Allow another VM to fail without disabling the appliance

• All hosts connected to a single JBOD cannot failover

Host 2

Host 1

Host 3

Host 4

JBOD

IB &Ethernet Direct attached SAS

CTL MAD AD VMM

Compute 2

Compute 1

Page 35: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Data Storage

Page 36: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Built for Star Schemas

Fact Sales

Dim Date

Dim Customer Dim Product

Dim Store

Two Kinds of Tables in a Data Warehouse• Dimensions – What we report “by”• Facts – What we report “on”

Page 37: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Replicated Tables

TableCopy

CTL

TableCopy

No

de 1

No

de

3N

od

e 4

Page 38: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Distributed Tables

Records0-

100

Records100-200

Records200-300

Records300-400

Records400-500

Records500-600

Records600-700

Records700-800

CTL

Records800-900

Records900-1000

Records1000-1100

Records 1100-1200

Records 1200-1300

Records 1300-1400

Records 1400-1500

Records 1500-1600

No

de 1

No

de

3N

od

e 4

Page 39: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Join Compatibility

No

de 3

No

de 4

No

de 5

Replicated Dim DateYears 1990-2015

Distributed Dim CustomerCustomers A-I

Distributed Fact SalesSales For 2012Customers A-Z

Replicated Dim DateYears 1990-2015

Distributed Dim CustomerCustomers J-S

Distributed Fact SalesSales For 2013Customers A-Z

Replicated Dim DateYears 1990-2015

Distributed Dim CustomerCustomers T-Z

Distributed Fact SalesSales For 2014Customers A-Z

Distributed Dim CustomerCustomers A-Z

Distributed Dim CustomerCustomers A-Z

Distributed Dim CustomerCustomers A-Z

Page 40: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Skew

Jan 2013 Sales

Feb 2013 Sales

Mar 2013 Sales

Apr 2013 Sales

May 2013 Sales

Jun 2013 Sales

Jul 2013 Sales

Aug 2013 Sales

CTL

Sep 2013 Sales

Oct 2013 Sales

Nov 2013 Sales

Dec 2013 Sales

Jan 2014 Sales

Feb 2014 Sales

Mar 2014 Sales

Apr 2014 Sales

No

de 1

No

de

3N

od

e 4

Page 41: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Skew

Jan 2013 Sales

Feb 2013 Sales

Mar 2013 Sales

Apr 2013 Sales

May 2013 Sales

Jun 2013 Sales

Jul 2013 Sales

Aug 2013 Sales

CTL

Sep 2013 Sales

Oct 2013 Sales

Nov 2013 Sales

Dec 2013 Sales

Jan 2014 Sales

Feb 2014 Sales

Mar 2014 Sales

Apr 2014 Sales

No

de 1

No

de

3N

od

e 4

Page 42: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

xVelocity Clustered Columnstore

Page 43: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

xVelocity Clustered Columnstore

Cu

stom

er

Sales

Co

un

try

Sup

plier

Pro

du

cts

• Updateable and clustered xVelocity columnstore

• Clustered Columnstore can save up to 91% in storage usage

• Memory-optimized for next-generation performance

• Updateable to support bulk and/or trickle loading

• Reduced maintenance by minimizing indexes

• All PDW data types are supported

Page 44: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

xVelocity Clustered Columnstore

• Table consists of column store and row store

• “Tuple mover” converts data into columnar format once segment is full (1M of rows)

• INSERT • Always lands into delta store

• DELETE• Logical and does not physically remove row until

REBUILD is performed

• UPDATE• Logical DELETE followed by INSERT.

• BULK INSERT• if batch > 100k loads directly to columnstore

• SELECT • Unifies data from Column and Row stores

C1 C2 C3 C5 C6C4

Co

lum

nSt

ore

C1 C2 C3 C5 C6C4

Del

ta (

row

)st

ore

tup

le mo

ver

Page 45: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Polybase

Page 46: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Query Across PDW and Hadoop with Polybase

EnhancedPDW query

engine

Data Scientists

BI Users

DB Admins

T-SQL Results

PDW V2

Relational data

Traditional schema-based DW applications

Social Apps

Sensor & RFID

Mobile Apps

WebApps

Non-relational data

Hadoop

Page 47: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

Polybase

• Allows for TSQL Queries against HDFS Data• Parallelization Affinity Between PDW and

Hadoop• Supports multiple flavors of Hadoop

• HDInsight• Hortonworks• Cloudera

Ethernet Switch

Ethernet Switch

Infiniband Switch

Infiniband Switch

Control Node

HA Server

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Scale Unit Server

Scale Unit Server

Scale Unit Storage

Ethernet Switch

Ethernet Switch

Hadoop Name Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Hadoop Node

Page 48: Managing Big Data - PASS€¦ · Managing Big Data With the Microsoft Parallel Data Warehouse Appliance. Chris Campbell Analytics Platform Practice Lead About Me. Business Insights

PDW In Action