working with the data lake · session’s focus –query the the data lake catalog & search...
TRANSCRIPT
© 2020, Amazon Web Services, Inc. or its Affiliates.
Team or presenters nameDate
Working With the Data Lake Subtitle
© 2020, Amazon Web Services, Inc. or its Affiliates.
Session’s Focus – Query the The Data Lake
Catalog & Search Access & User Interfaces
Data Ingestion
Analytics & Serving
S3
Amazon DynamoDB
Amazon Elasticsearch Service
AWS AppSync
AmazonAPI Gateway
AmazonCognito
AWS KMS
AWSCloudTrail
Manage & Secure
AWS IAM
Amazon CloudWatch
AWS Snowball
AWS Storage Gateway
Amazon Kinesis Data
Firehose
AWS Direct Connect
AWS Database Migration
Service
AmazonAthena
Amazon EMR
AWS Glue
Amazon Redshift
Amazon DynamoDB
AmazonQuickSight
AmazonKinesis
Amazon Elasticsearch
Service
Amazon Neptune
AmazonRDS
Central StorageScalable, secure, cost-
effective
AWS Glue
AWSDataSync
AWS Transfer for SFTP
Amazon S3 Transfer Acceleration
© 2020, Amazon Web Services, Inc. or its Affiliates.
Table of contents
1. Querying the Data Lake with Amazon Athena
2. Visualizing the Data Lake with Amazon QuickSight
© 2020, Amazon Web Services, Inc. or its Affiliates.
Querying the Data LakeWorking with Amazon Athena
© 2020, Amazon Web Services, Inc. or its Affiliates.
When would you query the Data Lake?
Examine New Data Sources
Interrogate Logs
Perform Analytics
Explore Relationships Across Sets
Check Data Quality
?
When would you query the Data Lake?
© 2020, Amazon Web Services, Inc. or its Affiliates.
An interactive query service that makes it easy to analyze data directly from Amazon S3 using Standard SQL
An interactive query service that makes it easy to analyze data directly from Amazon S3 using Standard SQL
© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon Athena
• Query data in your Amazon S3 based data lake
• Analyze infrastructure, operation, and application logs
• Interactive analytics using popular BI tools
• Self-service data exploration for data scientists
• Embed analytics capabilities into your applications
© 2020, Amazon Web Services, Inc. or its Affiliates.
What does it look like?What does it look like?
© 2020, Amazon Web Services, Inc. or its Affiliates.
Athena is Serverless
• No Infrastructure or administration
• Zero Spin up time
• Transparent upgrades
© 2020, Amazon Web Services, Inc. or its Affiliates.
Use ANSI SQL
• Support for complex joins, nested queries & window functions
• Support for complex data types (arrays, structs)
• Support for partitioning of data by any key
Use ANSI SQL
© 2020, Amazon Web Services, Inc. or its Affiliates.
Familiar Technologies Under the Covers
Used for SQL QueriesIn-memory distributed query engineANSI-SQL compatible with extensions
Used for DDL functionalityComplex data typesMultitude of formats Supports data partitioning
© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon Athena is Cost Effective
• Pay per query
• $5 per TB scanned from S3
• DDL Queries and failed queries are free
• Save by using compression, columnar formats, partitions
© 2020, Amazon Web Services, Inc. or its Affiliates.
Athena Workgroups
Athena Workgroups are used to isolate queries between different teams, workloads or applications, and to set lon amount of data
each query or the entire workgroup can process
Workload Isolation Query Metrics Cost Controls
© 2020, Amazon Web Services, Inc. or its Affiliates.
Workgroups – Workload Isolation
Unique query output location per Workgroup
Encrypt results with unique AWS KMS key
per Workgroup
Collect and publish aggregated metrics per
Workgroup to AWS CloudWatch
Use Workgroup settings eliminating need to configure individual users
© 2020, Amazon Web Services, Inc. or its Affiliates.
Workgroups – Metric Reporting
Total bytes scanned per Workgroup
Total failed queries per Workgroup
Total successful queries per Workgroup
Total query execution time per Workgroup
© 2020, Amazon Web Services, Inc. or its Affiliates.
Workgroups – Cost Controls
• Per query data scanned threshold; exceeding, will cancel query
• Trigger alarms to notify of increasing usage and cost• Disable Workgroup when all queries exceed a maximum threshold
Any Athena metric: successful/failed & total queries, query run time, etc.
© 2020, Amazon Web Services, Inc. or its Affiliates.
Workgroups – Usage Notifications
Define a hierarchy of alarms to be alerted as usage increases
© 2020, Amazon Web Services, Inc. or its Affiliates.
Athena support for interface endpoint (PrivateLink)Submit queries securely
• No internet gateway required in your VPC• Secure communication between your VPC and Athena APIs• Set VPC endpoint policies
• Example endpoint policy
© 2020, Amazon Web Services, Inc. or its Affiliates.
Athena support for INSERT INTO
Inserts new rows into a destination table based on a SELECT querystatement that runs on a source table, or based on a setof VALUES provided as part of the statement• Supported Format
Avro, JSON, ORC, Parquet, Text file• Example
© 2020, Amazon Web Services, Inc. or its Affiliates.
Visualizing the Data LakeWorking with Amazon QuickSight
© 2020, Amazon Web Services, Inc. or its Affiliates.
Why Amazon QuickSight
Cloud native = No servers = Auto-ScaleNo servers or software to manage, maintain, deploy. Start with 10s of users and scale to 10s of 1000s
Fully integrated with AWSBuild end-to-end analytics in AWS. Secure private VPC access, fine-grained access control, ML integrations
Secure and globalEnd-to-end encryption. Native High Availability. 10 Global regions. HIPAA, PCI, ISO, SOC and FedRamp eligibility
Customize and embed Embed in applications and enable analytics in hours, not months or years. Use themes to match application/corporate branding
Easy to develop and maintainDesign with Amazon QuickSight, integrate with APIs. Secure data with row-level security and authenticate seamlessly via single sign-on
Fast, consistent performanceFast, predictable performance every time. Concurrent users or increased interactions do not slow down the system
ML insightsContextual, relevant insights with ML-powered anomaly detection, forecasting, alerts and customizable narratives
Insights for everyoneProvide access to all users, pay only for usage. No upfront costs, no charges for inactive users
© 2020, Amazon Web Services, Inc. or its Affiliates.
Connect to your data, wherever it is
QuickSight is natively integrated with AWS data sources, as well as on-premise and hosted databases and third party business applications
On-premisesSecurely connect to on-premise databases and flat files like Excel and CSV
In the cloudConnect to hosted database, big data formats, and secure VPCs
ApplicationsConnect directly to third party business applications
• Salesforce• Square• Adobe Analytics• Jira• ServiceNow• Twitter• Github
• Redshift• RDS• S3• Athena• Aurora• Teradata• MySQL
• Presto• Spark• SQL Server• Postgre SQL• MariaDB• Snowflake• IoT Analytics
• Excel• CSV• Teradata• MySQL• SQL Server• PostgreSQL
QuickSight is natively integrated with AWS data sources, as well as on-premise and hosted databases and third party business applications
© 2020, Amazon Web Services, Inc. or its Affiliates.
AWS + Amazon QuickSight
Amazon S3 AmazonQuickSight
Create dashboards in minutes
Native fine-grained permissions for users with AWS Identity and Access Management (IAM)
Cost allocation by business unit or team
Amazon Athena
AmazonQuickSight
AmazonRedshift
AmazonRDS
Private VPC connectivity = no public routing of data
Direct query to data sources or SPICE for fast access
AmazonEMR
Pay for what you useCreate dashboards in minutes
Native fine-grained permissions for users with AWS Identity and Access Management (IAM)
Cost allocation by business unit or team
Private VPC connectivity = no public routing of data
Direct query to data sources or SPICE for fast access
Pay for what you use
AmazonRedshift
AmazonEMR
AmazonRDS
© 2020, Amazon Web Services, Inc. or its Affiliates.
Preview dataRename, remove fields, change data typesCreate new calculated fieldsFilter rowsIssue direct query or ingest to SPICEVisually join tables from the same relational DBPush down custom SQL queries
Data Prep
Preview dataRename, remove fields, change data typesCreate new calculated fieldsFilter rowsIssue direct query or ingest to SPICEVisually join tables from the same relational DBPush down custom SQL queries
© 2020, Amazon Web Services, Inc. or its Affiliates.
SPICE
QuickSight is powered by SPICE, a super-fast calculation engine that delivers performance and scale, regardless of how many users are active.
SPICEYour Data Source
QuickSight is powered by SPICE, a super-fast calculation engine that delivers performance and scale, regardless of how many users are active.
Your Data Source
© 2020, Amazon Web Services, Inc. or its Affiliates.
SPICESPICE not only provides your users with instant response times, but automatically scales with user activity, protecting your underlying data sources saving you time and money.
Up to 10X faster (millisecond latency)
Availability Zone 2
QuickSight Query Layer
SPICE TABLE SPICE TABLE
S3
SPICE TABLE
Availability Zone 1
S3
Availability Zone 3
S3
Support for high concurrency
Fault-tolerant, self-healing
Instant failover with zero impact
Backed up in S3 (Write Ahead Log)
SPICE not only provides your users with instant response times, but automatically scales with user activity, protecting your underlying data sources saving you time and money.
© 2020, Amazon Web Services, Inc. or its Affiliates.
User Types / User Roles
AdminManage UsersManage SPICE CapacityManage VPC ConnectionsManage Account Settings
AuthorCreate Data SetsCreate AnalysesCreate Dashboards
ReaderConsume Dashboards
QS AdminSometimes separate from Business Users, sometimes the sameUsually has AWS Console access
Business UserAnyoneCan be internal or external users (customers/partners3rd parties)
AnalystSometimes in IT, sometimes Business Users‘Data Analyst’‘Data Engineer’‘BI Engineer’
© 2020, Amazon Web Services, Inc. or its Affiliates.
Data governance
Create managed datasets that give power users and authors the flexibility to perform self-serve analytics on data that you control.
Create datasets that:
• Can be shared with any user• Automatically refresh• Have row level security• Users cannot modify• Dynamically update
with changes
© 2020, Amazon Web Services, Inc. or its Affiliates.
Differentiate with natural language and ML
Anomaly detectionDiscover unexpected trends and outliers
against millions of business metricsAuto narratives
Summarize your business metrics in plain language
ForecastingMachine learning forecasting with point and click simplicity
ML predictionsVisualize and build
predictive dashboards with Amazon SageMaker models
© 2020, Amazon Web Services, Inc. or its Affiliates.
Anomaly Detection
© 2020, Amazon Web Services, Inc. or its Affiliates.
Forecasting
© 2020, Amazon Web Services, Inc. or its Affiliates.
Natural Language Narratives
© 2020, Amazon Web Services, Inc. or its Affiliates.
ML insights
© 2020, Amazon Web Services, Inc. or its Affiliates.
Fully interactive with drill down, filtering, & external links
Personalized views with row-level security
No servers to manage, no long-term commitments
Pay for usage with pay-per-session reader pricing
Seamless authentication
Embed Amazon QuickSight Dashboards
Fully interactive with drill down, filtering, & external links
Personalized views with row-level security
No servers to manage, no long-term commitments
Pay for usage with pay-per-session reader pricing
Seamless authentication
© 2020, Amazon Web Services, Inc. or its Affiliates.
Why Amazon QuickSight for embedded dashboards
Contents secured byAD
Users Contents secured byADUsers
© 2020, Amazon Web Services, Inc. or its Affiliates.
What's new? ThemingYou can now create a collection of themes, and apply a theme to an analysis and all its dashboards
© 2020, Amazon Web Services, Inc. or its Affiliates.
New APIsData source APIsCreate/manage data source connections w/o sharing credentialsAudit data sources, manage ownership and access
Dataset APIsCreate customized datasets for users or groupsAllows isolation of data, additional row-level security possible
SPICE ingestion APIsTrigger SPICE ingestion of data when ETL/data load is completeEasily review history and trace SPICE ingestion details
Fine grained access control APIsManage user/group access to Amazon S3/Athena data via IAM policies
Template APIsCreate templates from analysisMetadata for dashboard with placeholder for datasets Templates accessible via API onlyCan be copied/referenced across accounts
Dashboard APIsEasily create dashboards per user/group w/different datasets Move dashboards across dev environmentsVersion dashboards for easy rollbacksAudit dashboards, list by users, manage ownership/sharing
© 2020, Amazon Web Services, Inc. or its Affiliates.
Using APIs for your embedded deployments
Data SourceCustomer 2
Customer 3
Customer 1
Application
Identity Store
Personalized, embedded Amazon QuickSight
dashboards for customers/users
Data APIs
User APIs
Your application
AMAZON QUICKSIGHT
© 2020, Amazon Web Services, Inc. or its Affiliates.
QuickSight Support for Cross Data Source Join
• Join across all data sources supported by QuickSight including file-to-file,file-to-database, and database-to-database joins
© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon QuickSight: Examples – AWS Cost & Usage Reporting
© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon QuickSight: Examples – Salesforce Analytics
© 2020, Amazon Web Services, Inc. or its Affiliates.
Q & A