pivotal hawq - high availability (2014)

80
A NEW PLATFORM FOR A NEW ERA SK Krishnamurthy

Upload: saravana-krishnamurthy

Post on 17-Feb-2017

116 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Pivotal HAWQ - High Availability (2014)

A NEW PLATFORM FOR A NEW ERA

SK Krishnamurthy

Page 2: Pivotal HAWQ - High Availability (2014)

2© Copyright 2013 Pivotal. All rights reserved.

Agenda HAWQ failover and HA now

HAWQ HA upcoming release

What’s new in PHD 1.1

Pivotal Command Center new features

Discuss roadmap in conjunction with AMEX requirements Open discussion: SAW, PHD 1.1 upgrade, …

Page 3: Pivotal HAWQ - High Availability (2014)

3© Copyright 2013 Pivotal. All rights reserved. 3© Copyright 2013 Pivotal. All rights reserved.

HAWQ - Availability

Nov 25, 2013

Page 4: Pivotal HAWQ - High Availability (2014)

4© Copyright 2013 Pivotal. All rights reserved.

Deployment Model – Sample HAWQ Cluster

HAWQPM

HAWQSM PNN SNN

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

Page 5: Pivotal HAWQ - High Availability (2014)

5© Copyright 2013 Pivotal. All rights reserved.

HAWQ Master Fails

HAWQPM

HAWQSM

PNN SNN

Action Availability Notes

HAWQ Cluster Yes (with downtime) HAWQ Cluster available. How does clients connect to SM?Manual process to connect to standby master. Similar to GPDB.

Current “SELECT” queries Aborted Users need to restart the query.

Current Transaction Aborted Dirty data & temp files will be removed.

New “SELECT” & transaction

Yes SM will continue to process queries.

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

Page 6: Pivotal HAWQ - High Availability (2014)

6© Copyright 2013 Pivotal. All rights reserved.

HAWQ Master Fails

HAWQPM

HAWQSM

PNN SNN

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

Execution coordinator resides on master

Distributed transaction master resides on master

Log copied up to last committed transaction

Run gpactivatestandby on secondary master

Either VIP or DNS hostname change to re-route client connections

Page 7: Pivotal HAWQ - High Availability (2014)

7© Copyright 2013 Pivotal. All rights reserved.

HAWQPM

HAWQSM

PNN SNN

Action Availability Notes

HAWQ Cluster Un-Available Cluster is considered to be down.

Current “SELECT” queries Aborted Can’t restart the query.

Current Transaction Aborted Dirty data & temp files will be removed.

New “SELECT” & Transaction query

Not possibleDN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

HAWQ Master & Standby Master Fail

Page 8: Pivotal HAWQ - High Availability (2014)

8© Copyright 2013 Pivotal. All rights reserved.

HAWQPM

HAWQSM

PNN SNN

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

HAWQ Master & Standby Master Fail

Configure RAID 10 for HAWQ master so primary segment data directory is never lost

Page 9: Pivotal HAWQ - High Availability (2014)

9© Copyright 2013 Pivotal. All rights reserved.

PNN Fails

HAWQPM

HAWQSM

PNN SNN

Action Availability Notes

HAWQ Cluster Yes (with downtime) Meta data query can be carried out, but no other queries. No DDL or DML.

Current “SELECT” queries Aborted Users need to restart the query.

Current Transaction Aborted After the PNN Is up, dirty data & temp files will be removed.

New “SELECT” & Transaction query

Not possible

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

• PHD 1.1:• (option 1)Manually bring up PNN. HAWQ cannot switch to secondary name node.• (option 2)HDFS admin should change the FQDN or IP address of secondary NN to the PNN.• HAWQ master keeps on trying to connect PNN and when it finds one, the cluster becomes operational.

• PHD 1.1.1 (Dec,13)• QA verified testing of above 2 options.

Page 10: Pivotal HAWQ - High Availability (2014)

10© Copyright 2013 Pivotal. All rights reserved.

PNN Fails

HAWQPM

HAWQSM

PNN SNN

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

• PHD 1.1:• (option 1)Manually bring up PNN. HAWQ cannot switch to secondary name node.• (option 2)HDFS admin should change the FQDN or IP address of secondary NN to the PNN.• HAWQ master keeps on trying to connect PNN and when it finds one, the cluster becomes operational.

• PHD 1.1.1 (Dec,13)• QA verified testing of above 2 options.

Normal HDFS failover process

Change DNS name of secondary NN to the current NN

Namenode service will be supported in PHD 1.2 (February)

Page 11: Pivotal HAWQ - High Availability (2014)

11© Copyright 2013 Pivotal. All rights reserved.

PNN & Secondary NN Fail

HAWQPM

HAWQSM

PNN SNN

Action Availability Notes

HAWQ Cluster No Meta data query can be carried out, but no other queries. No DDL or DML.

Current “SELECT” queries Aborted Users need to restart the query.

Current Transaction Aborted After the PNN Is up, dirty data & temp files will be removed.

New “SELECT” & Transaction query

Not possible

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

• PHD 1.1:• (option 1)Manually bring up PNN. HAWQ cannot switch to secondary name node.• (option 2)HDFS admin should change the FQDN or IP address of secondary NN to the PNN.• HAWQ master keeps on trying to connect PNN and when it finds one, the cluster becomes operational.

• PHD 1.1.1 (Dec,13)• QA verified testing of above 2 options.

Page 12: Pivotal HAWQ - High Availability (2014)

12© Copyright 2013 Pivotal. All rights reserved.

PNN & Secondary NN Fail

HAWQPM

HAWQSM

PNN SNN

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

No split information

No transactions

Page 13: Pivotal HAWQ - High Availability (2014)

13© Copyright 2013 Pivotal. All rights reserved.

Secondary NN Fail

HAWQPM

HAWQSM

PNN SNN

Action Availability Notes

HAWQ Cluster Yes Fully available

Current “SELECT” queries Yes

Current Transaction Yes

New “SELECT” & Transaction query

YesDN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

Page 14: Pivotal HAWQ - High Availability (2014)

14© Copyright 2013 Pivotal. All rights reserved.

A Segment Fails

HAWQPM

HAWQSM

PNN SNN

Action Availability Notes

HAWQ Cluster Yes HAWQ Cluster available.

Current “SELECT” queries Aborted Users need to restart the query.

Current Transaction Aborted Dirty data & temp files will be removed.

New “SELECT” & Transaction query

Yes Remaining segments will handle the query.

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

Page 15: Pivotal HAWQ - High Availability (2014)

15© Copyright 2013 Pivotal. All rights reserved.

A Segment Fails

HAWQPM

HAWQSM

PNN SNN

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

Segments QE (Query Executers) are killed

HAWQ does not materialize intermediate results

Local actions by QE is not committed

Segment QEs are started by other segments in consequent queries

QE substitution is random

Future release for option to materialize work files

Page 16: Pivotal HAWQ - High Availability (2014)

16© Copyright 2013 Pivotal. All rights reserved.

Multiple Segment Fail

HAWQPM

HAWQSM

PNN SNN

Action Availability Notes

HAWQ Cluster Yes HAWQ Cluster available.

Current “SELECT” queries Aborted Users need to restart the query.

Current Transaction Aborted Dirty data & temp files will be removed.

New “SELECT” & Transaction query

Yes Remaining segments will handle the query.

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

Page 17: Pivotal HAWQ - High Availability (2014)

17© Copyright 2013 Pivotal. All rights reserved.

DN Fails

HAWQPM

HAWQSM

PNN SNN

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

Action Availability Notes

HAWQ Cluster Yes HAWQ Cluster available.

Current “SELECT” queries Yes SS will automatically connect to remote DN in the middle of currently executing query.

Current Transaction Yes Transaction will finish successfully.

New “SELECT” & Transaction query

Yes

• PHD 1.1:• No Impact. SS will continue to work with remote DN• Loss of data locality might introduce slight performance impact. In 10G network performance

impact is measured to be around 10% for large queries. Simple queries might experience 50% performance impact.

Page 18: Pivotal HAWQ - High Availability (2014)

18© Copyright 2013 Pivotal. All rights reserved.

DN Fails

HAWQPM

HAWQSM

PNN SNN

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

• PHD 1.1:• No Impact. SS will continue to work with remote DN• Loss of data locality might introduce slight performance impact. In 10G network performance

impact is measured to be around 10% for large queries. Simple queries might experience 50% performance impact.

libhdfs faults to read from HDFS replica

Short-term performance loss until NN marks DN as dead

Page 19: Pivotal HAWQ - High Availability (2014)

19© Copyright 2013 Pivotal. All rights reserved.

Segment Host Dies

HAWQPM

HAWQSM

PNN SNN

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

DN

SS SS SS

SS SS SS

Action Availability Notes

HAWQ Cluster Yes HAWQ Cluster available.

Current “SELECT” queries Aborted Users need to restart the query.

Current Transaction Aborted Dirty data & temp files will be removed.

New “SELECT” & Transaction query

Yes Remaining segments will handle the query.

Page 20: Pivotal HAWQ - High Availability (2014)

20© Copyright 2013 Pivotal. All rights reserved.

Single Disk Failure in DN JBOD

– If Tempdata is not in the failed disk then no impact on the cluster or query. – If Tempdata is configured to be on the failed disk.

▪ Small queries will run, but large queries with too much temporary data will be impacted. ▪ Transactions will be aborted and new transaction will continue if multiple disk are configured

to contain tempdata.

RAID 5– No impact.– Possible performance loss.

RAID 10– No Impact & no performance loss.

Page 21: Pivotal HAWQ - High Availability (2014)

21© Copyright 2013 Pivotal. All rights reserved.

HAWQ HA on roadmap Automatic Namenode HA supported on PHD now

Automatic Namenode HA (name service) supported by HAWQ in February release

PXF to also support NN service

No interruption in query execution during NN failure

HAWQ HA unchanged

Page 22: Pivotal HAWQ - High Availability (2014)

22© Copyright 2013 Pivotal. All rights reserved. 22© Copyright 2013 Pivotal. All rights reserved.

What’s New in Pivotal HD 1.1November 7th, 2013

Page 23: Pivotal HAWQ - High Availability (2014)

23© Copyright 2013 Pivotal. All rights reserved.

Key Themes of PivotalHD 1.1 Release

Leverage more data, in real time, more easily to gain competitive advantage

Richer services and tools to create broader set of applications

Deeper, streamlined administrative capabilities for enterprise deployments

Page 24: Pivotal HAWQ - High Availability (2014)

24© Copyright 2013 Pivotal. All rights reserved.

Pivotal HD Architecture

HDFS

HBase

Pig, Hive, Mahout

Map Reduce

Sqoop Flume

Resource Management & Workflow

Yarn

Zookeeper

Apache Pivotal

Command CenterConfigure,

Deploy, Monitor, Manage

Data Loader

Pivotal HDEnterprise

Spring

Unified StorageService

XtensionFramework

CatalogServices

QueryOptimizer

Dynamic Pipelining

ANSI SQL + Analytics

HAWQ – Advanced Database Services

Hadoop VirtualizationExtension

Distrubuted In-memory

Store

Query Transactions

Ingestion Processing

Hadoop Driver – Parallel with Compaction

ANSI SQL + In-Memory

GemFire XD – Real-Time Database Services

MADlib Algorithms

beta

OozieVaidya

Page 25: Pivotal HAWQ - High Availability (2014)

25© Copyright 2013 Pivotal. All rights reserved.

GemFire XD : DeliversEnterprise real-time data processing platform for SLA critical applications; enables users to rapidly and reliably analyze & react to high volumes of events while leveraging10s of TBs of in-memory reference data.

Cloud Scale Real-Time Platform

Seamless Pivotal HD Integration

Optimized for Real-Time Analytics

• Very low & predictable latencies at high & variable loads

• 10s of TBs in-memory (Memscale)

• Multi-tiered caching• Efficient in-memory M-R• Real-time event

processing • Continuous querying

• SQL based queries• Support structured and

semi-structured* data• Java stored procedures• Deep Spring Data

integration• Native support for

JSON and Objects (Java, C++, C#)*

• Scale to HDFS with policy driven in-memory data retention

• Online and offline querying of HDFS data

• ETL-less bi-directional integration with other Pivotal HD services

Enterprise-Class Reliability

• JTA distributed transactions

• HA through in-memory redundancy

• Reliable event propagation

• Active-active deployments across WAN

* EA / Not in 1.0

Page 26: Pivotal HAWQ - High Availability (2014)

26© Copyright 2013 Pivotal. All rights reserved.

Feature BenefitCommand Center:

Install Wizard Faster, easier set up and configuration of HD cluster

Start/Stop Services Point/click control of multiple services through a central interface

HAWQ

UDF(Partial)

- C, PL/pgsql - pgcrypto, orafce

Enable richer data processing and analytics functionality leveraging existing SQL skill sets

Kerberos Support Tightly integrated security with HDFS

PXF: Writable HDFS Table Support

Easily export HAWQ data to HDFS for external consumption

HAWQ Input Format Reader Directly leverage HAWQ data in MapReduce, Pig and Hive

Diagnostic Tools Lower administration costs

Improved Query Planner “Orca” Enabled to provide more efficient query plans

What’s New in Pivotal HD 1.1

Page 27: Pivotal HAWQ - High Availability (2014)

27© Copyright 2013 Pivotal. All rights reserved.

Feature BenefitInstall/Config (ICM) CLI

Add/Remove Services Faster, easier set up and administration of services (e.g. Hbase, GemfireXD etc)

Upgrade Streamlined, low risk upgrade from 1.0.1 to 1.1

Apache Hadoop Components

Hadoop to 2.0.5 and select 2.0.6 patches

Greater stability and lower risk based on critical defect fixes incorporated

Oozie 3.3.2 Orchestrate data processing (e.g. MR, Pig) job pipelines with dependencies

Hive 11 (incl. HCatalog and Hiveserver2)

Significant improvements in functionality, scalability and security.

Hbase 0.94.8 Enables snapshots of tables without overhead to the Region Servers

RHEL 6.4 Certification Enhanced performance optimizations and security improvements

What’s New in Pivotal HD 1.1

Page 28: Pivotal HAWQ - High Availability (2014)

28© Copyright 2013 Pivotal. All rights reserved.

Feature BenefitPlatform and Security

Kerberos Support - HDFS - HAWQ - Unified Storage Service - PXF to be supported in Dec 2013

Tighter governance, risk and compliance

JRE 1.7.0.15 support Supported platform. JRE 1.6 is end of life.

RHEL 6.4 (FIPS) certification Federal standard for cryptography modules

Pgcrypto for HAWQ Flexible and robust encryption of sensitive data

ToolsUnified Storage Service: CDH4 as a data source

Stream data from CDH4

Data Loader - Push Stream API - Spring XD front end for Twitter

Integration support for wider variety of data sources

What’s New in Pivotal HD 1.1

Page 29: Pivotal HAWQ - High Availability (2014)

29© Copyright 2013 Pivotal. All rights reserved.

Command Center Cluster Deployment Wizard

• Performs “Host Verification” to determine host eligibility to be added to cluster

Page 30: Pivotal HAWQ - High Availability (2014)

30© Copyright 2013 Pivotal. All rights reserved.

Command Center Cluster Deployment Wizard

• Easily Add Eligible Nodes to Roles

• Basic Validation of Layout• Checkbox Add/Remove

Services• Ability to Download

Configuration Locally

Recorder Demo can be found -> Here

Page 31: Pivotal HAWQ - High Availability (2014)

31© Copyright 2013 Pivotal. All rights reserved.

Orca - Improved Optimizer Pluggable architecture, allowing faster innovation and quicker iteration on quality

improvements

Subset of improved functionality:

• Parity with Planner• Improved join-ordering• Join-Aggregate re-ordering• Sub-query de-correlation• Optimal sort-orders• Full integration of data (re-)distribution

• Contradiction detection• Elimination of redundant joins• Smarter Partition scan• Star-join optimization• Skew aware

Page 32: Pivotal HAWQ - High Availability (2014)

32© Copyright 2013 Pivotal. All rights reserved.

What’s new in PXF Profiles Writable external tables Hive partition pruning, HBase filtration Additional connectors & CSV support Complete extensibility Roadmap

– Security & authentication– Multi-FS support & other distributions via OS– Stand-alone service

Page 33: Pivotal HAWQ - High Availability (2014)

33© Copyright 2013 Pivotal. All rights reserved.

Why Pivotal HD? Big Data + Fast Data

The first enterprise grade platform that provides OLAP and OLTP with HDFS as the common data substrate

Enables closed loop analytics, real-time event processing and high speed data ingest

Page 34: Pivotal HAWQ - High Availability (2014)

34© Copyright 2013 Pivotal. All rights reserved.

Hawq Format Reader

Java Program(i.e. MapReduce

Job)HDFS

Hawq

Hawq Reader(Jar file)

1. Request is made to where Files for specific

“Table” exist

2. Location is returned on where are files

2. HDFS Files with Hawq Format are

streamed to Reader

Recorded Demo can be found -> Here

Page 35: Pivotal HAWQ - High Availability (2014)

35© Copyright 2013 Pivotal. All rights reserved.

Oozie now Included and Supported with PHD Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty.

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Oozie is a scalable, reliable and extensible system.

Page 36: Pivotal HAWQ - High Availability (2014)

36© Copyright 2013 Pivotal. All rights reserved.

Matrix of what is supported via Install method

Page 37: Pivotal HAWQ - High Availability (2014)

37© Copyright 2013 Pivotal. All rights reserved.

Security Dashboard (items in bold tested; rest are scheduled))

Support secure cluster

Supports Kerberos for Authentication

Support LDAP for Authentication

HDFS Yes Yes Linux OS supportsMapReduce/Pig Yes N/AHive Yes (standalone mode) N/A

Hiveserver No NoHiveserver2 Yes Yes YesHbase Yes Yes YesHAWQ* Yes Yes YesGemfireXD Yes Yes Yes

* Except PXF; Scheduled for Dec (PHD 1.1.1 release

Page 38: Pivotal HAWQ - High Availability (2014)

38© Copyright 2013 Pivotal. All rights reserved.

Vaidya

Page 39: Pivotal HAWQ - High Availability (2014)

39© Copyright 2013 Pivotal. All rights reserved. 39© Copyright 2013 Pivotal. All rights reserved.

RoadmapOpen Discussion

Nov 25, 2013

Page 40: Pivotal HAWQ - High Availability (2014)

40© Copyright 2013 Pivotal. All rights reserved.

Roadmap – Action Items Error tables released in PHD 1.2 (February)

– Current workaround

PCC new features?! SAW integration PHD 1.1 upgrade planning

Page 41: Pivotal HAWQ - High Availability (2014)

41© Copyright 2013 Pivotal. All rights reserved. 41© Copyright 2013 Pivotal. All rights reserved.

Appendix

Nov 25, 2013

Page 42: Pivotal HAWQ - High Availability (2014)

42© Copyright 2013 Pivotal. All rights reserved. 42© Copyright 2013 Pivotal. All rights reserved.

HAWQ

Nov 25, 2013

Page 43: Pivotal HAWQ - High Availability (2014)

43© Copyright 2013 Pivotal. All rights reserved.

History HAWQ 1.0 (March release)

– True SQL Engine in Hadoop▪ SQL 92, 99 & 2003 OLAP extensions▪ JDBC/ODBC

– Basic SQL functionalities▪ DDL and DML

– High availability feature– Transaction support

HAWQ 1.1 (June release)– JBOD support feature

HAWQ 1.1.1 (August release)– HDFS access layer read fault tolerance support– HAWQ diagnosis tool– ORCA enabled

HAWQ 1.1.2 (September release)– HAWQ MR Inputformat for AO tables– HDFS access layer write fault tolerance support– HDFS 2.0.5 support

HAWQ 1.1.3 (Oct release)– HAWQ Kerberos support– HAWQ on secure HDFS– UDF

HAWQ 1.1.4 (Dec release)– Gptoolkit– UDF enhancement– Manual failover for HDFS HA

HAWQ 1.2 (Feb release)– Parquet storage support – HAWQ MR Inputformat– Automatic failover for HDFS HA– …

Page 44: Pivotal HAWQ - High Availability (2014)

44© Copyright 2013 Pivotal. All rights reserved.

NetworkInterconnect

...

......HAWQ & HDFS MasterSevers

Planning & dispatch

SegmentSevers

Query execution

...Storage

HDFS, HBase …

Page 45: Pivotal HAWQ - High Availability (2014)

45© Copyright 2013 Pivotal. All rights reserved.

Namenode

Breplication

Rack1 Rack2

DatanodeDatanode Datanode

Read/Write

Segment

Segment host

SegmentSegment

Segment host

SegmentSegment host

Master host

Meta Ops

GPDB Interconnect

Segment

Segment

Segment

Segment hostSegment

Datanode

Segment Segment Segment Segment

Page 46: Pivotal HAWQ - High Availability (2014)

46© Copyright 2013 Pivotal. All rights reserved.

Query execution flow

Page 47: Pivotal HAWQ - High Availability (2014)

47© Copyright 2013 Pivotal. All rights reserved.

Parallel Query Optimizer• Converts SQL into a physical execution plan

– Cost-based optimization looks for the most efficient plan

– Physical plan contains scans, joins, sorts, aggregations, etc.

– Global planning avoids sub-optimal ‘SQL pushing’ to segments

– Directly inserts ‘motion’ nodes for inter-segment communication

• ‘Motion’ nodes for efficient non-local join processing

(Assume table A is distributed across all segments – i.e. each has AK)

– Broadcast Motion (N:N)

• Every segment sends AK to all other segments

– Redistribute Motion (N:N)

• Every segment rehashes AK (by join column) and redistributes each row

– Gather Motion (N:1)

• Every segment sends its AK to a single node (usually the master)

Page 48: Pivotal HAWQ - High Availability (2014)

48© Copyright 2013 Pivotal. All rights reserved.

Example of Parallel Query Optimization

48

select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment

from customer, orders, lineitem, nation

where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '1994-08-01' and o_orderdate < date '1994-08-01' + interval '3 month' and l_returnflag = 'R' and c_nationkey = n_nationkey

group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment

order by revenue desc

Gather Motion 4:1

(slice 3)

Sort

HashAggregate

HashJoin

Redistribute Motion 4:4

(slice 1)

HashJoin

Seq Scan on lineitem Hash

Seq Scan on orders

Hash

HashJoin

Seq Scan on customer Hash

Broadcast Motion 4:4

(slice 2)

Seq Scan on nation

Page 49: Pivotal HAWQ - High Availability (2014)

49© Copyright 2013 Pivotal. All rights reserved.

Interconnect

• UDP based• Flow control

Page 50: Pivotal HAWQ - High Availability (2014)

50© Copyright 2013 Pivotal. All rights reserved.

Metadata dispatch

• Metadata dispatch• Stateless segments

– Read only metadata on segment

Page 51: Pivotal HAWQ - High Availability (2014)

51© Copyright 2013 Pivotal. All rights reserved.

Transaction Full transaction support tables on HDFS

– When a load transaction is aborted, there will be some garbage data left at the end of file. For HDFS like systems, data cannot be truncated or overwritten.

Methods to process the partial data to support transaction. – Option 1: Load data into a separate HDFS file. Unlimited number of files.– Option 2: Use metadata to records the boundary of garbage data, and

implements a kind of vacuum mechanism.– Option 3: Implement HDFS truncation.

HDFS truncate is added to support transaction

Page 52: Pivotal HAWQ - High Availability (2014)

52© Copyright 2013 Pivotal. All rights reserved.

Transaction Snapshot isolation

Simplified Transaction Model Support– Simplified two phase commit

Page 53: Pivotal HAWQ - High Availability (2014)

53© Copyright 2013 Pivotal. All rights reserved.

Transaction support• Methods to process the partial data to support

transaction. – Option 1: Load data into a separate HDFS file.

Unlimited number of files.– Option 2: Use metadata to records the boundary of

garbage data, and implements a kind of vacuum mechanism.

– Option 3: Implement HDFS truncation.

Page 54: Pivotal HAWQ - High Availability (2014)

54© Copyright 2013 Pivotal. All rights reserved.

Pluggable storage• Read Optimized/Append only storage

• Column store– Compressions: quicklz, zlib, RLE– Partitioned tables hit HDFS limitation

• Parquet– Open source format– PAX like column store– Snappy, gzip

• MR Input/Output format

Page 55: Pivotal HAWQ - High Availability (2014)

55© Copyright 2013 Pivotal. All rights reserved.

HDFS C client: why • libhdfs (Current HDFS c client) is based on JNI. It is difficult to make

HAWQ support a large number of concurrent queries. • Example:

– 4 segments on each segment hosts– 50 concurrent queries– each query has 16 QE processes that do scan– there will be about 800 processes that start 800 JVMs to access HDFS. – If each JVM uses 500MB memory, the JVMs will consume 800 * 500M =

400G memory. – Thus naïve usage of libhdfs is not suitable for HAWQ. Currently we

have three options to solve this problem

Page 56: Pivotal HAWQ - High Availability (2014)

56© Copyright 2013 Pivotal. All rights reserved.

HDFS client: three options

• Option 1: use HDFS FUSE. HDFS FUSE introduces some performance overhead. And the scalability is not verified yet.

• Option 2 (libhdfs2): implement a webhdfs based C client. webhdfs is based on HTTP. It also introduces some costs. Performance should be benchmarked. Webhdfs based method has several benefits, such as ease to implementation and low maintenance cost.

• Option 3 (libhdfs3): implement a C RPC interface that directly communicates with NameNode and DataNode. Many changes when the RPC protocol is changed.

Page 57: Pivotal HAWQ - High Availability (2014)

57© Copyright 2013 Pivotal. All rights reserved. 57© Copyright 2013 Pivotal. All rights reserved.

PXF

Nov 25, 2013

Page 58: Pivotal HAWQ - High Availability (2014)

58© Copyright 2013 Pivotal. All rights reserved.

PXF is...

A fast extensible framework connecting Hawq to a data

store of choice that exposes a parallel API

Page 59: Pivotal HAWQ - High Availability (2014)

59© Copyright 2013 Pivotal. All rights reserved.

Hawq External Tables• gpfdist

– remote delimited text (or csv) files. • file

– text files on segment filesystem.• execute

– script execution and produced data• pxf

– text and binary data from available pxf connectors (mostly HD based).

Page 60: Pivotal HAWQ - High Availability (2014)

60© Copyright 2013 Pivotal. All rights reserved.

Steps• Step 1: GRANT ON PROTOCOL pxf• Step 2: Define a PXF table

– Pick built-in plugins right for the job– Specify data source of choice– Map remote data fields to Hawq db attributes

(plugin dependent)• Step 3: Query the PXF table.

– Directly– Or copy to a Hawq table firstCREATE EXTERNAL TABLE foo(<col list>)LOCATION (‘pxf://<host:port>/<data source>?<plugin options>’)FORMAT ‘<type>’(<params>)

Page 61: Pivotal HAWQ - High Availability (2014)

61© Copyright 2013 Pivotal. All rights reserved.

Page 62: Pivotal HAWQ - High Availability (2014)

62© Copyright 2013 Pivotal. All rights reserved.

Page 63: Pivotal HAWQ - High Availability (2014)

63© Copyright 2013 Pivotal. All rights reserved.

Page 64: Pivotal HAWQ - High Availability (2014)

64© Copyright 2013 Pivotal. All rights reserved. 64© Copyright 2013 Pivotal. All rights reserved.

New FeaturesMain additions since PHD1.0

Page 65: Pivotal HAWQ - High Availability (2014)

65© Copyright 2013 Pivotal. All rights reserved. 65© Copyright 2013 Pivotal. All rights reserved.

User Experience

Page 66: Pivotal HAWQ - High Availability (2014)

66© Copyright 2013 Pivotal. All rights reserved.

User Experience

• Improved/Informative error messages.• Profiles

LOCATION(‘pxf://<host:port>/sales?fragmenter=HiveFragmenter&accessor=HiveAccessor&resolver=HiveResolver’)

LOCATION(‘pxf://<host:port>/sales?profile=Hive’)

Page 67: Pivotal HAWQ - High Availability (2014)

67© Copyright 2013 Pivotal. All rights reserved.

profiles.xml

<profile> <name>HBase</name> <description>Used for connecting to an HBase data store engine</description> <plugins> <fragmenter>HBaseDataFragmenter</fragmenter> <accessor>HBaseAccessor</accessor> <resolver>HBaseResolver</resolver> <myidentifier>MyValue</myidentifier> </plugins></profile>

Page 68: Pivotal HAWQ - High Availability (2014)

68© Copyright 2013 Pivotal. All rights reserved.

profiles.xml

<profile> <name>HdfsTextSimple</name> <description>Used when reading delimited single line records from plain text files on HDFS</description> <plugins> <fragmenter>HdfsDataFragmenter</fragmenter> <accessor>LineBreakAccessor</accessor> <resolver>StringPassResolver</resolver> <analyzer>HdfsAnalyzer</analyzer> <-- (soon to be added) </plugins></profile>

Page 69: Pivotal HAWQ - High Availability (2014)

69© Copyright 2013 Pivotal. All rights reserved.

profiles.xml

<profile> <name>MyCustomProfile</name> <description>Used with a new set of plugins I wrote</description> <plugins> <fragmenter>MyFragmenter</fragmenter> <accessor>MyAccessor</accessor> <resolver>MyResolver</resolver> <analyzer>MyAnalyzer</analyzer> </plugins></profile>

Add your own profiles

Page 70: Pivotal HAWQ - High Availability (2014)

70© Copyright 2013 Pivotal. All rights reserved. 70© Copyright 2013 Pivotal. All rights reserved.

Export to HDFS

Page 71: Pivotal HAWQ - High Availability (2014)

71© Copyright 2013 Pivotal. All rights reserved.

Writable PXF

• gphdfs-like functionality– but extensible…– currently supports text, csv, SequenceFile– supports various hadoop compression Codecs

CREATE WRITABLE EXTERNAL TABLE ...LOCATION(‘pxf://<host:port>/sales?profile=HdfsTextSimple&COMPRESSION_CODEC=org.apache.hadoop.io.compress.GzipCodec')FORMAT ‘text’(delimiter ‘,’);

can create a new profile “HdfsTextSimpleGZipped” that includes compression_codec

LOCATION(‘pxf://<host:port>/sales?profile=HdfsTextSimpleGZipped')

Page 72: Pivotal HAWQ - High Availability (2014)

72© Copyright 2013 Pivotal. All rights reserved. 72© Copyright 2013 Pivotal. All rights reserved.

New Connectors

Page 73: Pivotal HAWQ - High Availability (2014)

73© Copyright 2013 Pivotal. All rights reserved.

New Connectors

• GemFire XD (Released. GA February)

• JSON (On github. GA February (r+w))

• Accumulo (On github. GA version being coded by Clearedge. GA February)

• Cassandra (On github. Alpha)

Non of them was written by the PXF Dev team… a testament for extensibility.

Page 74: Pivotal HAWQ - High Availability (2014)

74© Copyright 2013 Pivotal. All rights reserved.

Feature Summary★ HBase (w/filter pushdown)★ Hive (w/partition exclusion. various storage file types)★ HDFS Files: read (delimited text, csv, Sequence, Avro)★ HDFS Files: write (delimited text, csv, Sequence, various

compression codecs and options)★ GemFireXD, JSON format, Cassandra, Accumulo (currently Beta)★ Stats collection★ Automatic data locality optimizations★ Extensibility!

Page 75: Pivotal HAWQ - High Availability (2014)

75© Copyright 2013 Pivotal. All rights reserved.

Coming Up Very Soon...

★ Isilon Integration★ Kerberized HDFS Support★ Namenode High Availability

Page 76: Pivotal HAWQ - High Availability (2014)

76© Copyright 2013 Pivotal. All rights reserved.

Limitations• Local metadata of external data

– Will be made more transparent when UCS exists.• Authentication and Authorization of external

systems– Will be made simpler when centralized user mgmt exists.

• Currently supporting local PHD only• Error tables not yet supported• Sharing space with Name/DataNode

Page 77: Pivotal HAWQ - High Availability (2014)

77© Copyright 2013 Pivotal. All rights reserved. 77© Copyright 2013 Pivotal. All rights reserved.

Writing a pluginsteps and guidelines

Page 78: Pivotal HAWQ - High Availability (2014)

78© Copyright 2013 Pivotal. All rights reserved.

Main Steps

1. Verify P-HD running and PXF installeda. SingleCluster, AllInAll, SingleNode VM

2. Implement the PXF plugin API for your connector (Java)a. Use the PXF API doc as a reference

3. Compile your connector classes and add them to the hadoop classpath on all nodes

4. Restart PHD (won’t be necessary in the future)5. Add a profile (optional)

Page 79: Pivotal HAWQ - High Availability (2014)

79© Copyright 2013 Pivotal. All rights reserved.

Plugins• Fragmenter – returns a list of source data fragments

and their location• Accessor – access a given list of fragments read them

and return records• Resolver – deserialize each record according to a given

schema or technique• Analyzer – returns statistics about the source data

Page 80: Pivotal HAWQ - High Availability (2014)

80© Copyright 2013 Pivotal. All rights reserved. 80© Copyright 2013 Pivotal. All rights reserved.

Thanks!

Nov 25, 2013