born to be parallel, and beyond part ii · teradata fastexport utility built on the original...
Post on 13-Aug-2020
6 Views
Preview:
TRANSCRIPT
#TDPARTNERS16 GEORGIA WORLD CONGRESS CENTER
Born To Be Parallel, and Beyond
Part II
Carrie Ballinger
Rich Charucki
Performance Engineer, Teradata Labs
Teradata Fellow, Teradata Labs
At Teradata, we believe…
Analytics and data unleash the potential of great companies
• Part I• The Considerable Contribution of the BYNET®
• A Very Fluid File System
• Part II• Multiple Dimensions of Parallelism• Gracefully Managing the Flow of Work• The Roots of Prioritization• Conclusion – Emerging Opportunities
Agenda
3
Multiple Dimensions of Parallelism
Section
4
Parallel Execution Across All AMPsEach AMP is One Unit of Parallelism
1) Query Execution Parallelism
Backup & Recovery
Building Indexes
Row Locking
TransactionJournalling
SortingReading Writing
AMP 1
Loading
Aggregating
AMP 1’s DataStatistics
Building Indexes
Row Locking
TransactionJournalling
SortingReading Writing
AMP 2
Loading
Aggregating
AMP 2’s DataStatistics
• Teradata was designed to maximize throughput of each individual request and remove single points of control
• Architected so queries can benefit from multiple dimensions of parallelism5
Statistics
Building Indexes
Row Locking
TransactionJournalling
SortingReading Writing
AMP 3
Loading
Aggregating
AMP 3’s Data
Parallelism Across Multiple Query Steps
• The optimizer can choose to execute multiple steps within the same query at the same time
• A technique to speed up the query completion
JOINPRODUCT & INVENTORYRedistribute
1.1 1.2SCAN STORES
Redistribute
2.1 2.2JOINSPOOL
Redistribute…
JOINITEMS
& ORDERSRedistribute…
2) Multi-Step Parallelism
6
Parallelism of Activities Within a Query Step
• Pipelining of different operations within a single step• Overlapping of activities inside a step provides an additional
dimension of parallelism
3) Within-a-Step Parallelism
AMP 1AMP 2
AMP 3AMP 4
. . .
Select & Project Product tableSelect & Project Inventory table
Join Product and Inventory tables
Send joined rows to other AMPs (redistribute)
Tim
e 1-
Star
t Ste
pTi
me
2
Tim
e 3
Tim
e 4
Step
Don
e
7
Optimizer was Designed to Maximize Parallel Opportunities
Plan with serial joins
Table1 Table2
Table3
Table4
Table5
Join
Join
Join
Join
Join
Table6
Join
Plan with parallel joins
Table1 Table2 Table3 Table4
Table5
Join
Join Join
Join
Join
Table6
• Optimizer builds query plans to maximize the throughput of a single request– “Bushy plans” maximize parallel step
opportunities– Query will usually complete sooner 8
Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data
from the database• Final spool file is evenly distributed across all AMPs • Returned to the client in parallel using multiple sessions
• Each AMP is returning the answer set in parallel across multiple sessions
• Each AMP is working on the query in parallel
CLIENT
Returned Rows
AMP 1 AMP 2 AMP 3
Spool Spool Spool9
Teradata QueryGrid Performance Enhanced by Parallel Return of Remote Data
• Teradata parallelism provides more connection points when transferring remote data into the database.
• Multiple AMPs involved in receiving from or sending to data remote platforms.
• Dynamic statistics are collected on Hadoop data across each AMP in parallel.
10
Local Teradata
AMP
AMP
AMP
AMP
A
H
E
PE
Remote Hadoop
HIVE
HCATALOG
The Parallel Database Extensions (PDE) Layer Enables Virtualization of AMPs and Parsing Engines
The PDE layer virtualizes the hardware and the operating system for the database engine• Enables the definition of multiple virtual AMPs and Parsing Engines per node• Shields the database from having to know physical locations or hardware detail
when messages are sent• Makes it possible to maximize processing power as hardware evolvesThe PDE layer enables high availability• Can migrate AMPs to different nodes without database knowledge or involvement
Teradata Operating System
MP RAS / UNIX Linux
PDE – Parallel Database Extensions
The AMP Before Virtualization
12
In the early Teradata days the AMP was a hardware component
The addition of the PDE layer allowed multiple “virtual” AMPs to be defined on a single node
The AMP Board
Gracefully Managing the Flow of Work
Section
13
Work Flow Challenge for Parallel Databases
How much work is too much for an AMP?
Optimizer applies multiple levels of parallelism on each query• Database engine is good at exploiting parallelism• Just a few queries can saturate the system
Teradata was designed as a throughput engine• Able to be productive with high demand, many users, maximum
resource levels
14
How is system health and throughput protected during times of extremely high demand?
Complete Decentralized of Control Over the Flow of Work
Each AMP monitors and manages its own flow of work independentlyOnly pushes back when the AMP needs a slight pause to complete work already underwayTwo forms of pushing back:
1. Queueing up arriving messages2. Turning away new messages
AMP1 AMP2 AMP3
Can you do more work?
Can you do more work?
Can you do more work?
AMP1 AMP2 AMP3
Non-Scalable Polling Approach
Scalable Decentralized Approach
Can I do more work?
Can I do more work?
Can I do more work?
Central Controller
15
Work Messages: Vehicle for Bringing New Work Steps From the Parsing Engine to the AMPs
AMP Worker Tasks (AWTs) are stateless, can support a variety of work including
• User-submitted work (load jobs, queries)
• Internal software processes (such as space accounting)
Are allocated at start-upWork messages are categorized into “Work Types” based on importance of the work (Work00, Work01, Work02)
Optimized Query Steps
Completion message to PE
Step is sent to AMPs
Parsing Engine
Work messages use AMP worker tasks to accomplish their work
Pool of Available AWTs
Message gets an AWT to do the work within the message
AWT is released
Step is done
AMPs
16
When All AMP Worker Tasks are Busy, Arriving Messages are Queued
Each AMP has its own local work message queue in memory Queued messages are sequenced by:
• Work type in descending sequence• Priority within the work type
Some AMPs may de-queue a work message and begin processing a query step sooner than other AMPs
Work01 messages
Work00 messages
Work02 messages
17
When Too Many Messages are Queued Up, Arriving Messages are Returned to the Sender
3 Work01 messages
20 Work00 messages
Queued messages for Work Type Work00 have reached
their limit of 20
Try later...
18
Newly-arriving Work00 messages will be
returned to sender
Each AMP Makes Its Own Decisions Independently
Each AMP monitors its own work flow, and pushes back temporarily when it has more work than it can easily process
AMP 4AWTs are available
AMP 5AWTs are available
AMP 9Exhausted AWTs
Messages are queued up
AMP 6AWTs are available
AMP 7In flow control
Messages retried
Node 0 Node 2
Node 1 Node 3
AMP 0AWTs are available
AMP 8AWTs are available
AMP 10AWTs are available
AMP 3AWTs are available
AMP 1AWTs are available
AMP 11AWTs are available
Two AMPs are queueing messages
One AMP is sending messages back
AMP 2Exhausted AWTs
Messages are queued up
19
A w
ork
mes
sage
arri
ves
on a
ll A
MP
s
Riding the Wave of Full Usage
Flow control mechanisms are embedded deep in the base of the database
• Able to support parallelism and minimize query execution time when just a few queries are active
• And protects overall system health under extreme usage conditionsGetting back to normal processing is simple, immediate, minimal overhead
• No communication layers need traversing• No messaging or alerting to other components
Provides a highly-efficient, non-intrusive mechanism that performs well with 2 AMPs or 2000 AMPs
It worked great at the beginning, it still works great
20
Prioritization
Section
21
Simple Priority Scheme Embedded in the Original Teradata Database
Internal database routines were architected to use different priorities• In order to support maximum levels of user activity and still get critical
internal work and background tasks completed• Provided a way to boost query performance at critical processing points• Background tasks may start at a low priority but self-promote their priority if
they cannot get their work donePriorities externalized for customer use22
RushPriority
!High
Priority
!MediumPriority
!Low
Priority
!
Customers Embrace Workload Management
Users drove changes and enhancements such as:• A broader set of priority definitions• Concurrency control mechanisms at multiple levels• Rules that identify and reject poorly-written queries• Ability to automatically demote or abort resource-heavy queries
Teradata Active System Management evolved for the Enterprise platformsTeradata Integrated Workload Management for the non-enterprise platforms
23
Internal tasks and the database code continues to rely on the original four priority buckets
Conclusion:Emerging Opportunities
Section
24
Building on a Solid FoundationKey characteristics architected into the original Teradata Database are still delivering performance advantage:
Internal checks and balances that optimize the flow of work
The restorative action of numerous non-intrusive background tasks
Adaptable and flexible file system
Parallelism and parallel-aware Optimizer
Performance boosts of the BYNET
Building on a Solid Foundation
Virtualization has been emerging slowly over time:
AMP as physical entityAs a collection of software processes (relying on PDE)
Parsing Engine as a physical entity As a collection of software processes (relying on PDE)
The YNET/BYNET as a proprietary hardware BYNET as software that can run on any general interconnect
File system structures tied to physical locations on disk Underlying storage managed by TVS with complete disassociation of data and its location
26
More Information
Content of this slideware is based on the white paper: Born to be Parallel, and Beyond
http://assets.teradata.com/resourceCenter/downloads/WhitePapers/EB3053_new.pdf
Additional sessions on Teradata Database futures:Teradata Database 16.0 Overview Part I & Part IITom Fastner / Phil BentonWednesday 11:00 / 12:00, Room C101
At Teradata…
We empower companies to achieve high-impact business outcomes
through analytics at scale on an agile data foundation
Thank You
Questions/Comments
Email:
Rate This Session # with the PARTNERS Mobile App
Remember To Share Your Virtual Passes
Carrie.Ballinger@teradata.comRichard.J.Charucki@teradata.com
297
29
top related