amazon redshift insider series

30
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift Insider Series

Upload: others

Post on 15-Jan-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Redshift

Insider Series

Page 2: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Prasad Varakur, Product Manager, Redshift, AWS

Christian Romming, Founder and CEO, Etleap

May 2020

Accelerating performance

with Amazon Redshift

materialized views

Page 3: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Prasad Varakur

Product Manager,

Amazon Redshift, AWS

Speakers

Christian RommingFounder and CEO, Etleap

Page 4: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Agenda

• Introduction

• Use cases for Amazon Redshift materialized views

• materialized views – details

• Customer success story – by Eeteap

• Demo

Page 5: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Redshift benefitsTens of thousands of customers use Amazon Redshift and process over 2 EB of data per day

3x faster than other cloud data warehouses

Up to 75% less than other cloud data warehouses

Predictable costs

Lake formation catalog & security

Exabyte querying & AWS integrated (e.g., AWS DMS, Amazon CloudWatch)

AWS-grade security (e.g., VPC, encryption with AWS KMS, AWS CloudTrail)

Certifications such as SOC, PCI, DSS, ISO, FedRRAMP, HIPAA

Easy to provision & manage, automated backups, AWS support, and 99.9% SLAs

Virtually unlimited elastic linear scaling

Page 6: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Robust result set

caching

Large # of tables support

~20,000

Copy command support

for ORC, ParquetIAM role chaining Elastic resize Groups

Amazon Redshift Spectrum: date

formats, scalar JSON and ION file

formats support, region expansion,

predicate filtering

Auto

analyze

Health and performance

monitoring w/Amazon

CloudWatch

Automatic table distribution

style

CloudWatch

support for

WLM queues

Performance enhancements:

hash join, vacuum, window

functions, resize ops, aggregations,

console, union all, efficient compile

code cache

Cost

controlsAuto WLM

~25 query monitoring

rules (QMR) support

200+new features in the past

18 months

AQUA (Advanced Query Accelerator)

Concurrency scaling DC1 migration to DC2Resiliency of

ROLLBACK processing

Manage multi-part query

in AWS console

Auto analyze for

incremental changes

on table

Spectrum Request

Accelerator

Apply new

distribution key

Amazon Redshift

Spectrum: Row group

filtering in Parquet and

ORC, Nested data support,

enhanced VPC routing,

multiple partitions

Faster classic

resize with optimized

data transfer

protocol

Performance: Bloom filters in

joins, complex queries that

create internal table,

communication layer

Amazon Redshift Spectrum:

Concurrency scaling

Integration with AWS

Lake Formation

Auto-vacuum sort,

auto-analyze, and

auto-table sort

Auto WLM with

query prioritiesSnapshot scheduler

Performance: Join

pushdowns to subquery,

mixed workloads temporary

tables, rank functions, null

handling in join, single row insert

Advisor recommendations

for distribution keysAZ64 compression

encoding

Console

redesign Stored procedures

Spatial processing Column level access

control with AWS lake

formationRA3

Performance of

inter-region

snapshot transfers

Federated

QueryMaterialized

views

Pause

and resume

Features Delivered to Meet Customer Needs

Page 7: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Redshift Materialized Views

What are Materialized Views?

o New kind of database object, to consider in database modeling

o Combines benefits of tables and views

Designed to get orders of magnitude performance improvement

o Mileage varies depending on multiple factors

Stores pre-computed results of a query AND efficiently maintains it

o Converts a complex SPJA query into a simple select query

Typically, useful for pre-canned workloads

o Predictable and repeated query patterns, for example ETL, BI,

dashboards

Page 8: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Benefits of Materialized Views

1. Speed up queries by orders of magnitude

• For predictable workloads

• Save work with precomputed, materialized views

2. Simplify and accelerate maintenance of precomputed results

3. Easier and faster migration to Amazon Redshift

Page 9: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Use case: ELT and BI

Amazon Redshift

Simplifies maintenance and boost perf of pre-aggregated tables and reporting tables

Page 10: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Speedup: Original query

item_key store_key cust_key price

i1 s1 c1 12.00

i2 s2 c1 3.00

i3 s2 c2 7.00

store_key owner loc

s1 Joe SF

s2 Ann Chicago

s3 Lisa SF

“Query: What were the total sales in SF?”

Loc total_sales

SF 12.00

salesstore_info

Join-Aggregate

Page 11: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Speedup: Materialized Views Precomputed Results

loc total_sales

SF 12.00

Chicago 10.00

loc_sales

item_key store_key cust_key price

i1 s1 c1 12.00

i2 s2 c1 3.00

i3 s2 c2 7.00

store_key owner loc

s1 Joe SF

s2 Ann Chicago

s3 Lisa SF

sales store_inf

o

Page 12: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Defining a Materialized View

CREATE MATERIALIZED VIEW loc_sales DISTKEY (loc) AS (

SELECT si.loc AS loc, SUM(s.price) AS total_sales

FROM sales s, sales_info si

WHERE s.store_key = si.store_key

GROUP BY si.loc);

Page 13: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Materialized Views Speedup Query

loc total_sales

SF 12.00

“Query: What were the total sales in SF?”

loc total_sales

SF 12.00

Chicago 10.00

loc_sales

SELECT loc, total_sales

FROM loc_sales

WHERE loc = “SF”;

Use the MV like a Table

Page 14: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Use case: Decouple Read and Write Workloads Consider a hot table which is frequently updated and read

Create MV for the read queries, and the hot table acts as base table for writes

REFRESH the MV periodically

ReadersWriters

Readers

Writers

Page 15: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Redshift Materialized Views

1. Speed up queries by orders of magnitude

2. Simplify and accelerate maintenance of precomputed results

• Fast refresh: Efficient, incremental

• Example: ETL/BI pipelines

Page 16: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Fast Refresh: Amazon Redshift Incrementally Maintains

item_key store_key cust_key price

i1 s1 c1 12.00

i2 s2 c1 3.00

i3 s2 c2 7.00

store_key owner loc

s1 Joe SF

s2 Ann Chicago

s3 Lisa SF

loc total_sales

SF 12.00

Chicago 10.00 sales

store_info

loc_sales

i1 s3 c3 5.00

i2 s2 c4 8.00

db> REFRESH MATERIALIZED VIEW loc_sales;

Page 17: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Fast Refresh: Amazon Redshift Incrementally Maintains

item_key store_key cust_key price

i1 s1 c1 12.00

i2 s2 c1 3.00

i3 s2 c2 7.00store_key owner loc

s1 Joe SF

s2 Ann Chicago

s3 Lisa SF

loc total_sales

SF 12.00+5.00

Chicago 10.00+8.00

sales

store_info

loc_sales

i1 s3 c3 5.00

i2 s2 c4 8.00

db> REFRESH MATERIALIZED VIEW loc_sales;

IncrementalChanges!

Page 18: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Fast Refresh: Amazon Redshift Incrementally Maintains

item_key store_key cust_key price

i1 s1 c1 12.00

i2 s2 c1 3.00

i3 s2 c2 7.00store_key owner loc

s1 Joe SF

s2 Ann Chicago

s3 Lisa SF

loc total_sales

SF 17.00

Chicago 18.00sales

store_info

loc_sales

db> REFRESH MATERIALIZED VIEW loc_sales;

i1 s3 c3 5.00

i2 s2 c4 8.00

Page 19: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Refresh Types and Limitations Incremental refresh One or more tables: INNER JOIN

Aggregates: count(), sum()

Expressions, pure functions, WHERE, GROUP BY, HAVING

Recompute refresh Everything else: window functions, set operations, ORDER BY, etc.

MV fully recomputed (basically, a CTAS)

Unsupported MV Table types: external, views, other MV’s, temps, system tables

Function types: volatile, unstable

Clauses: order, limit, offset

Page 20: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Operations That Force Recompute on REFRESH

Vacuum

Truncate

Alter distkey

Alter sortkey

Dist all -> dist even (small table)

DDL operations on the base tables

Page 21: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Materialized Views: System Tables

STV_MV_INFO

• The STV_MV_INFO table contains a row for every materialized

view, whether the data is stale, and state information.

STL_MV_STATE

• Contains a row for every state transition of a materialized view.

SVL_MV_REFRESH_STATUS

• View contains a row for the refresh activity of materialized views.

Page 22: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Redshift Materialized Views

1. Speed up queries by orders of magnitude

2. Simplify and accelerate maintenance of precomputed results

3. Easier and faster migration to Amazon Redshift

Page 23: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Success story

Page 24: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Page 25: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Web Store DW

Purchase User

LineItem Item

LoginEvents Logins

by User

Materialized

View

Redshift

schemastables

S3

Event

Logs

Purchase User

LineItem Item

* 1

* 1

1

*

MySQL

managed by Etleap

Page 26: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Demo

Page 27: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Speeding up Models @ AXS

1. Regular ingest of 3.8 M records per day

2. Before: CTAS (CREATE TABLE AS)

3. After: Materialized Views

Faster ingest times!

Time to ingest new records is constant

CTAS MV Speedup

Steady-State

Average371s 49s 7.9x

Page 28: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Page 29: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Summary Combines benefits of tables/CTAS and views

Compute once, use multiple times, efficient maintenance

Typically, useful for ETL, BI, Dashboarding workloads

Boost performance of repeated and predictable queries

Simplify maintenance of precomputed results

User creates materialized views that use one or more tablesCREATE MATERIALIZED VIEW mv_name AS (<SELECT_query>);

Speed up queries by accessing materialized viewsSELECT * FROM mv_name WHERE ...;

REFRESH to incrementally maintain the materialized views

REFRESH MATERIALIZED VIEW mv_name;

Page 30: Amazon Redshift Insider Series

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Resources• Amazon Redshift documentation:

https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html

• Blog: Materialize your Amazon Redshift Views to Speed Up Query Execution: https://aws.amazon.com/blogs/aws/materialize-your-amazon-redshift-views-to-speed-up-query-execution/

• Blog: Speeding up Etleap models at AXS with Amazon Redshift materialized viewshttps://aws.amazon.com/blogs/big-data/speeding-up-etleap-models-at-axs-with-amazon-redshift-materialized-views/

• Blog: Speed up your ELT and BI queries with Amazon Redshift materialized viewshttps://aws-preview.aka.amazon.com/blogs/big-data/speed-up-your-elt-and-bi-queries-with-amazon-redshift-materialized-views/