原生sql on hadoop引擎- apache hawq 2.x 术解密(apache) hawq project launched hadoop 2.0...

42
原生SQL on Hadoop引擎- Apache HAWQ 2.X 最新技解密 马丽丽 2017.5.13

Upload: others

Post on 20-May-2020

13 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

原生SQL on Hadoop引擎-Apache HAWQ 2.X 最新技术解密

马丽丽 2017.5.13

Page 2: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

提纲

● Apache HAWQ 历史

● 系统架构

● 最新功能介绍

● 展望与未来

Page 3: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

3© 2015 Pivotal Software, Inc. All rights reserved.

HAWQ 是什么???

Hadoop-native SQL query engine and advanced analytics MPP

database that offers high-performance interactive query execution and machine learning to

Data Analysts & Data Scientists who want to find insights in

large/complex datasets.

Pivotal HDB

HORTONWORKS

HDBPowered by Apache HAWQ

Page 4: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

历史回顾

1986 … 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017

Postgres developed at UC Berkeley

Postgres adds support for SQL

Open Source PostgreSQL

PostgreSQL 7.0 released

PostgreSQL 8.0 released

Greenplum based on PostgreSQL

Hadoop 1.0 Released

HAWQ + MADlib go open-source

(Apache)HAWQ project launched

Hadoop 2.0 Released

MADlib launched

HAWQ 2.0 Release

HDB/HAWQ 2.2 Release

HAWQ 1.0Release

Page 5: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

HAWQ架构

5

Node Manager

DataNode

Container

Segment

Container

QE

QE

QE

QE

QE

QE

Node Manager

DataNode

Container

Segment

Container

QE

QE

QE

QE

QE

QE

Node Manager

DataNode

Container

Segment

Container

QE

QE

QE

QE

QE

QE

YARNResource Manager

HAWQMasterHAWQ

Master

Catalog service

HAWQMasterNameNode

Page 6: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

Server 1

HAWQ 部件图

HAWQ Master

Metadata

Transaction Mgr.

Query Parser Query Optimizer

QE

Resource Mgr.

NameNode

Interconnect

HAWQ Segment

Postmaster

Datanode

HAWQ Segment HAWQ Segment

YARN RMPostmaster

Query Dispatch

Local directory(Temp Data / Logs)

Server 2 Server N

Local directory(Temp Data / Logs)

Local directory(Temp Data / Logs)

Virtual Segments (Query Executors)

libhdfs3

Postmaster

Virtual Segments (Query Executors)

libhdfs3

Postmaster

Virtual Segments (Query Executors)

libhdfs3

HAWQ Standby Master

Metadata

Transaction Mgr.

Query Parser Query Optimizer

QE

Resource Mgr.

Postmaster

Query DispatchWA

L re

plic

atio

n

….

….

PXF YARN NM Datanode PXF YARN NM Datanode PXF YARN NM

Page 7: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

7© 2016 Pivotal Software, Inc. All rights reserved.

HAWQ 2.0概览

Areas of Enhancement New Features

Elastic & Scalable Architecture

Hadoop-Native Integrations

Performance & Optimizations

YARN-Integrated 3-Tier Resource Mgmt

Simpler Management via Ambari and CLI

HCatalog integration - Read Access

Block-level Storage

Dynamic Cluster Expansion (no redistribute)

Per Table Directory storage (user friendly)

HDFS Catalog Cache

New Dispatcher + Fault Tolerance Service

Elastic Runtime for Query Execution

Simplified User Experience

Cloud-Readiness

Page 8: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

8© 2016 Pivotal Software, Inc. All rights reserved.

Cluster level

Global (YARN)

HAWQ(Resource Qs)

Query (Internal)

Cluster-Admin defined

Hardware efficiency

Share with MR/Hive/+

Defined in XML

HAWQ Internal

HAWQ-Admin defined

Multi-tenancy

Workload prioritization

Defined in DDL

Query level

System defined

Query Optimization

Operator prioritization

Dynamic

分层资源管理

Page 9: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

9© 2016 Pivotal Software, Inc. All rights reserved.

• Responsibility– Responsible for acquiring & returning CPU/Mem resources from/to YARN

– Responsible for resource allocation among HAWQ users and queries

• Master resource manager process– Resource negotiation with YARN and resource allocation

– Manage and maintain the resources in resource pool

– Handle resource allocation/return RPC requests from QD (query dispatcher)

– Fault tolerance service are in the same process

• Segment resource manager process– One HAWQ RM on each Segment

– Negotiation with Master resource manager (for resource enforcement)

– Fault tolerance service: Heartbeat sender

资源管理器

Page 10: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

10© 2016 Pivotal Software, Inc. All rights reserved.

pg_root

dept1 dept2

adhoc1 daily_batch

monthly_report

dept3

adhoc2 daily_report

ceo_reportpg_default

Branch RQs

Leaf RQs

Default RQs

层级资源队列

Page 11: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

11© 2016 Pivotal Software, Inc. All rights reserved.

CREATE RESOURCE QUEUE name WITH (queue_attribute=value [, ... ])

where queue_attribute can be:

PARENT=’queue_name’ACTIVE_STATEMENTS=integerMEMORY_LIMIT_CLUSTER=percentageCORE_LIMIT_CLUSTER=percentageSEGMENT_RESOURCE_QUOTA=’mem:memory_units’RESOURCE_OVERCOMMIT_FACTOR=factor

memory_units ::= {64mb|128mb|256mb|1gb|2gb}percentage ::= integer %

Example: create resource queue test_queue_1 with (parent='pg_root', memory_limit_cluster=50%,

core_limit_cluster=50%);

创建资源队列示例

Page 12: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

12

YARN NameNode

Resource Broker

libYARN

Resource Manager

Fault Tolerance Service

Optimizer

Parser/ Analyzer

Dispatcher Catalog Service

HDFS Catalog Cache

Client

Client

Client

YARNNode Manager

HDFS DataNode

Segment

Virtual Segment

Virtual SegmentVirtual

Segment

Virtual Segment

YARNNode Manager

HDFS DataNode

Segment

Virtual Segment

Virtual SegmentVirtual

Segment

Virtual Segment

YARNNode Manager

HDFS DataNode

Segment

Virtual Segment

Virtual SegmentVirtual

Segment

Virtual Segment

External System

Page 13: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

HAWQ Resource Manager

(Application Master)

13

YARN Resource Manager

Yarn Node Manager

LibYarn

register/unregisterallocate/release resourceget cluster/container reportsget queue information

resource track and schedule

active

RM与Yarn的交互

Page 14: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

14© 2016 Pivotal Software, Inc. All rights reserved.

● Query execution is dynamic & flexible○ Allows Scale-up/down○ Allows Scale-in/out○ Smart & efficient use of resources○ More adapted to shared or cloud environments

● How it works: “block level storage” and “virtual segments”

○ Block level storage support■ AO and Parquet■ Scanners read granular blocks (vs files)■ More control on task granularity

○ Plan/Task scheduling■ Choose nodes that have data close■ Dispatch query to nodes with available

resources■ Start virtual segments on demands

VirtualSegment

VirtualSegment

Blocks Blocks Blocks

VirtualSegment

弹性查询执行

Page 15: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

15© 2016 Pivotal Software, Inc. All rights reserved.

Simple QueryHAWQ Master

Low # of v-segs

Complex QueryHAWQ Master

High # of v-segs

Query on HASH-dist Table or RQ enforced

HAWQ Master

Pre-defined # of v-segs / buckets

HAWQ Node

HAWQ Cluster

Virtual Segment

虚拟Segment

Page 16: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

16© 2016 Pivotal Software, Inc. All rights reserved.

查询执行流程图

ResourceManager

YARN

containers

In YARN mode

Page 17: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

17© 2016 Pivotal Software, Inc. All rights reserved.

查询计划

Query Plan• Relational operators: scans, joins, etc• Parallel ‘motion’ operators

Parallel Motion Operators:• Broadcast: Every segment sends the input tuples to all other segments • Redistribution: Every segment rehashes tuples on a column and redistributes

to the appropriate segments

• Gather: Every segment sends the input tuples to a single segment (i.e. the master)

SELECT l_orderkey, count(l_quantity) FROM lineitem, orders

WHERE l_orderkey=o_orderkey AND l_tax>0.01

GROUP BY l_orderkey;

Page 18: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

Server NServer 2Server 1

查询执行示例

HAWQ Master

Metadata

Transaction Mgr.

Query Parser Query Optimizer

Resource Mgr.

NameNode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

YARN RMPostmaster

Local directory Local directory Local directory

NN Cache

Page 19: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

Server NServer 2Server 1

查询执行示例 - 计划生成

HAWQ Master

Metadata

Transaction Mgr.

Query Parser Query Optimizer

NN Cache

Resource Mgr.

NameNode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

YARN RMPostmaster

Query Dispatch

Local directory Local directory Local directory

Page 20: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

Server NServer 2Server 1

查询执行示例 - 资源申请

HAWQ Master

Metadata

Transaction Mgr.

Query Parser Query Optimizer

NN Cache

Resource Mgr.

NameNode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

YARN RMPostmaster

Query Dispatch

VS VS VS VS VS

Local directory Local directory Local directory

I need 5 containersEach with 1 CPU coreand 1 GB RAM

Server 1: 2 containersServer 2: 1 containerServer N: 2 containers

VS = Virtual Segment (container for Query Executors)# of QEs in a v-seg = # of slices in a query

Page 21: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

查询执行示例 -准备执行

HAWQ Master

Metadata

Transaction Mgr.

Query Parser Query Optimizer

NN Cache

Resource Mgr.

NameNode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

YARN RMPostmaster

Query Dispatch

VS VS VS VS VS

Server 1

Local directory

Server 2

Local directory

Server N

Local directory

VS = Virtual Segment (container for Query Executors)# of QEs in a v-seg = # of slices in a query

Page 22: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

查询执行示例 - 执行

HAWQ Master

Metadata

Transaction Mgr.

Query Parser Query Optimizer

NN Cache

Resource Mgr.

NameNode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

YARN RMPostmaster

Query Dispatch

VS VS VS VS VS

Server 1

Local directory

Server 2

Local directory

Server N

Local directory

VS = Virtual Segment (container for Query Executors)# of QEs in a v-seg = # of slices in a query

Page 23: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

查询执行示例 - 结果返回

HAWQ Master

Metadata

Transaction Mgr.

Query Parser Query Optimizer

NN Cache

Resource Mgr.

NameNode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

YARN RMPostmaster

Query Dispatch

VS VS VS VS VS

Server 1

Local directory

Server 2

Local directory

Server N

Local directory

VS = Virtual Segment (container for Query Executors)# of QEs in a v-seg = # of slices in a query

Page 24: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

查询执行示例 - 清理

HAWQ Master

Metadata

Transaction Mgr.

Query Parser Query Optimizer

NN Cache

Resource Mgr.

NameNode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

YARN RMPostmaster

Query Dispatch

VS VS VS VS VS

Free query resourcesServer 1: 2 containersServer 2: 1 containerServer N: 2 containers

OK

Server 1

Local directory

Server 2

Local directory

Server N

Local directory

VS = Virtual Segment (container for Query Executors)# of QEs in a v-seg = # of slices in a query

Page 25: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

Server NServer 2Server 1

查询执行示例 - 清理

HAWQ Master

Metadata

Transaction Mgr.

Query Parser Query Optimizer

NN Cache

Resource Mgr.

NameNode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

HAWQ Segment

Postmaster

HDFS Datanode

YARN RMPostmaster

Local directory Local directory Local directory

Query Dispatch

Page 26: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

HDB/HAWQ 2.2.0.0最新功能

26

● HAWQ Register● HAWQ Ranger 集成

● PXF ORC Profile● RHEL-7 Support

Page 27: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

HAWQ Extract/HAWQ Register

27

● HAWQ Extract ‒ Extract out metadata & HDFS file location for the table to yaml configuration file

‒ Yaml configuration can be used by HAWQInputFormat

‒ Usage hawq extract [-h hostname] [-p port] [-U username] [-d database] [-o output_file] [-W] <tablename>

● HAWQ Register‒ Register existing files on HDFS directly to HAWQ internal table‒ Scenario

‒ Register other systems generated data‒ HAWQ cluster migration

‒ Usage‒ hawq register [-h <hostname>] [-p <port>] [-U <username>] -d <databasename> -f <hdfspath>

<tablename> ‒ hawq register [-h <hostname>] [-p <port>] [-U <username>] -d <databasename> -c <configFile>

<tablename>

Page 28: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

HAWQ与Ranger集成

28

● Ranger: A Global User Authorization Tool in Hadoop eco-system‒ Can support multiple systems such as HDFS, Hive, HBase, Knox, etc.‒ Provides a central UI for user to defining policies for different systems‒ Provide a base Java Plugin thus feasible for other products to define its own

plugin to be controlled by Ranger

● HAWQ Current ACL‒ Implement through Grant/Revoke SQL Command‒ Current ACL is controlled by catalog table, which is stored in HAWQ master

● HAWQ needs to keep align with hadoop eco-systems, so we need integrate with Ranger ACL

‒ Provide a GUC specifying whether enable ranger as ACL check‒ Once ranger is configured, move all the ACL check to Ranger side‒ Define all the policies in Ranger

Page 29: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

29© 2016 Pivotal Software, Inc. All rights reserved.

Apache Ranger: 集中化权限管理工具

Page 30: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

30© 2016 Pivotal Software, Inc. All rights reserved.

Apache Ranger 架构

HBase

Ranger Administration Portal

HDFS

HAWQ

Ranger Audit Server

Ranger Plugin

Had

oop

Com

pone

nts

Ent

erpr

ise

Use

rs

Legacy Tools and Data Governance

Knox

Ranger Policy Server

Storm

Solr

HDFS

Ranger Plugin

Ranger Plugin

Ranger Plugin

Ranger Plugin

Solr

YARN

HiveServer 2

Ranger Plugin

NiFi

Atlas

Ranger Plugin

Ranger Plugin

Ranger Plugin

Ranger Plugin

Ranger UgSync

Ranger TagSync

LDAP/AD/OS

Atlas

Page 31: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

31© 2016 Pivotal Software, Inc. All rights reserved.

RangerPluginService

HAWQ与Ranger集成

HAWQ

Ranger Base Plugin

USERS ADMIN

Ranger Policy Manager

policy store

audit store

REST APIs

JAR

user sync

HAWQ Master host Ranger Admin host

RPS

Page 32: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

HAWQ

32

LDAP Server

Ranger Admin Server

active

用户管理典型场景

Ranger REST Service

HAWQ Ranger Plugin

Ranger Policy DB

Create user in LDAP Server

User synced to HAWQ

Policy syncSend ACL Check request

User synced to Ranger

Page 33: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

HAWQ

33

LDAP Server Ranger Admin Server

HAWQ Ranger工作序列图

Ranger REST Service Ranger Policy DB

1. create user2. sync user information

3. sync user info

4.define policy 5.store policy

6. fresh policy

7. send query

8. send ACL check

9. return result

10. return

Page 34: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

34

未来工作

Page 35: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

TDE(透明数据加密) 支持

35

● TDE: HDFS implements transparent, end-to-end encryption‒ Data is transparently encrypted and decrypted without requiring changes to

user application code‒ Data can only be encrypted and decrypted by the client‒ HDFS never stores or has access to unencrypted data or unencrypted data

encryption keys● HAWQ Enhancement

‒ Modify libhdfs3 to add support for TDE

Page 36: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

Parquet 格式升级

36

● Parquet 2.0 Enhancement‒ Add more Converted Type: Enum, Decimal, Date, Timstamp‒ Add more statistics in DataPageHeader: including max/min/null count, distinct

count ‒ Add Dictionary Page‒ Add sorting column information in Rowgroup meta‒ …

● HAWQ Upgrade to Parquet 2.0 support ‒ Bring performance improvement by leveraging statistics information‒ Become more compatible with other systems which have supported Parquet

2.0

Page 37: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

欢迎加入Apache HAWQ社区

37

● 贡献方式‒ Document/Wiki Enrich‒ Bug Report/Fix‒ 新功能开发

● 联系我们‒ Website: http://hawq.incubator.apache.org/‒ Wiki:

https://cwiki.apache.org/confluence/display/HAWQ

‒ Repo: https://github.com/apache/incubator-hawq.git

‒ JIRA: https://issues.apache.org/jira/browse/HAWQ

‒ Mailing lists: dev/[email protected]

Page 38: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

background image: 960x540 pixels - send to back of slide and set to 80% transparency

Thanks

Page 39: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

39© 2016 Pivotal Software, Inc. All rights reserved.

Manage HDB Alongside Hadoop Services

Hadoop-Native Administration via Ambari

● Installation & Configuration○ Use standard Ambari interface

○ Install HDB with just a few mouse clicks

○ Wizard-based experience

○ Stack Advisor enhancements

○ Proactive user warnings

○ Service Checks

● Kerberos & High Availability Support

● HAWQ Master > Standby Failover

● Cluster Expansion Support

● Visual Widgets on System Resources

● Service & Component Alerts

Page 40: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

PXF Framework

40

Apache Tomcat

PXF Webapp

REST API

Java API

libhdfs3 (written in C) segments

External Tables

Native Tables

HTTP, port: 51200

Java API

Java/Thrift

Page 41: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

Architecture - Read Data Flow

HAWQMaster Node NN

pxf

DN1

pxf

HAWQseg1

select * from ext_table0

getFragments() API

pxf://<location>:<port>/<path>

1

Fragments (JSON)2

7

3

Assign Fragments to Segments

DN1

pxf

HAWQseg1

DN1

pxf

HAWQseg1

Query dispatched to Segment 1,2,3… (Interconnect)

5

Read() REST

6 records

8

query result

Records (stream)

Fragmenter

Resolver

Accessor

4

Page 42: 原生SQL on Hadoop引擎- Apache HAWQ 2.X 术解密(Apache) HAWQ project launched Hadoop 2.0 Released MADlib launched HAWQ 2.0 Release HDB/HAWQ 2.2 ... Elastic & Scalable Architecture

42© 2016 Pivotal Software, Inc. All rights reserved.

Simplified interoperability on external data

New HCatalog Integration

SELECT * FROM hcatalog.default.weblogsWHERE ts between ‘2015-09-01’ and ‘2015-09-30’;

weblogs: id double ts timestamp ...

HIVE

PXF

PXF

PXFHCAT

disk heap:pg_class...

in-memory:pg_exttablepg_class...

HAWQ Extension Framework

Now, HAWQ can read the schema automatically from HCatalog