introduction to snowflake best practices€¦ · suspend/resume auto suspend/resume • on-demand,...
TRANSCRIPT
© 2019 Snowflake Inc. All Rights Reserved
INTRODUCTION TO SNOWFLAKE
BEST PRACTICESGRAHAM MOSSMAN, SALES ENGINEER
© 2019 Snowflake Inc. All Rights Reserved
AGENDA
2
• Virtual Warehouse Management
• Cost Management
• Business Unit Chargebacks
© 2019 Snowflake Inc. All Rights Reserved
VIRTUAL WAREHOUSE MANAGEMENT
© 2019 Snowflake Inc. All Rights Reserved 4
VIRTUAL WAREHOUSE MANAGEMENT
Considerations• Key SLA’s and challenges with
meeting SLA’s
• Data load and transformation workloads
• Reporting, ad hoc analysis, and data science workloads
• Cost management
Agenda• Sizes and approach to right-sizing
• Scaling up vs. scaling out
• Automating suspend/resume, sizing, and multi-cluster scale-out
• Aligning with workload patterns, environments, roles, and chargeback needs
• Monitoring workload patterns
© 2019 Snowflake Inc. All Rights Reserved
WAREHOUSE SIZESSizes Servers / Cluster Credits / Hour Notes
X-Small 1 1 Default size when created using CREATE WAREHOUSE.
Small 2 2
Medium 4 4
Large 8 8
X-Large 16 16 Default size for warehouses created in the web UI.
2X-Large 32 32
3X-Large 64 64
4X-Large 128 128
5
© 2019 Snowflake Inc. All Rights Reserved 6
RIGHT-SIZING• Start with a sizable, single query workload
• Keep in mind 1 minute billing minimum
• Linear performance improvements are cost neutral
• Step back one warehouse size when performance is no longer linear
• Workload patterns will determine best size
• Best to start undersized, increase over time as workload patterns are better understood
© 2019 Snowflake Inc. All Rights Reserved 7
SCALING UP VS. OUT Scaling Up (X-Small → 4X-Large)• Improves individual query performance• Improves data load performance concurrency
for numerous files (dozens to 1000’s) when loading a given table
• Programmatically resize a warehouse throughout the load window as workload patterns change
Scaling Out (multi-cluster warehouse)• Improves level of session/query concurrency• Set cluster size based on typical minimal
workload; auto-scaling will kick in during periods of increased query activity to meet demand and avoid queueing
• Cluster MUST be large enough for largest queryThe numbers in the grid are the
Snowflake credits consumed for an hour’s worth of compute
© 2019 Snowflake Inc. All Rights Reserved 8
AUTOMATING SUSPEND/RESUME
Auto Suspend/Resume• On-demand, end-user workloads• Suspend idle time setting should take into
account data caching
Programmatic Suspend/Resume• Scheduled jobs where process orchestration is
controlled• Programmatically resume at the start of
processing and suspend at the end of processing to avoid idle time costs
© 2019 Snowflake Inc. All Rights Reserved 9
ALIGNING WITH WORKLOADS Separation by workload pattern:
• Environments: DEV / TEST / PROD• Overlapping ELT workflows• Consumer types: reporting, ad hoc analysis,
data science• Business Units for cost tracking: marketing
data science, R&D data science, etc.Additional considerations:
● Data load performance is a function of the number files and available threads for concurrency
● Query concurrency is better optimized with multi-cluster warehouses vs a larger single cluster
● Resource monitors should be used in order to adequately govern credit usage
Data science
ETL
Dev/QA
BI/Visualization(Auto scaling)
© 2019 Snowflake Inc. All Rights Reserved
ALIGNING WITH WORKLOADS - EXAMPLE● Should reflect units of workload management
○ ETL○ BI / Dashboards○ Ad hoc Reporting○ Data Science
ContinuousLoading (4TB/day) S3
<5min SLA
Virtual Warehouse
MediumData Loads &
Transformation
Virtual Warehouse
Large
Virtual Warehouse2X-Large
Reporting(Segmented)
Ad hoc Analysis
Virtual WarehouseX-Large - Multi-Cluster
Prod DB
10
© 2019 Snowflake Inc. All Rights Reserved 11
MONITORING WORKLOADS
● The Web UI provides a visual representation of usage activity for a virtual warehouse within the last 14 days
● The WAREHOUSE_LOAD_HISTORY table function in INFORMATION_SCHEMA provides a queryable representation of usage activity for a virtual warehouse within the last 14 days
○ Excessive idle periods can be identified where the AVG_RUNNING column is 0, indicating auto suspend idle time may need to be shortened or handled programmatically
○ Excessive queuing can be identified with the AVG_QUEUED_LOAD column, indicating a possible need to resize or enable multi-clustering
● Create a process to capture daily deltas into a user table for maintaining longer periods of history and to query across all virtual warehouses at once with a single SQL statement for trend analysis
Warehouse Load Over Time is available in the WebUI by clicking on the Warehouse Name
© 2019 Snowflake Inc. All Rights Reserved
COST MANAGEMENT
© 2019 Snowflake Inc. All Rights Reserved 13
Considerations• Compute Costs• Storage Costs• Service Costs• Data Transfer (Egress) Costs• Monitoring & Alerting
COST MANAGEMENT
Agenda
● Resources Incurring Costs● Compute
○ Viewing Usage○ Resource Monitors
● Storage○ Time Travel & Fail-Safe○ Viewing Usage
● Services○ Non-warehouse compute
● Data Egress
© 2019 Snowflake Inc. All Rights Reserved
RESOURCES INCURRING COSTS
Materialized ViewsAccount
Virtual Warehouses
Databases Schemas
Tables
Permanent
Temp/Transient
AutomaticClustering
Service
Stages
Internal
Cross-RegionExtract Egress
PipesCompute Costs
Storage CostsService CostsPass-through Costs
Materialized Views
14
© 2019 Snowflake Inc. All Rights Reserved 15
• Web UI• Billing & Usage page (under Account)
• INFORMATION_SCHEMA table function• WAREHOUSE_METERING_HISTORY
• ACCOUNT USAGE share views
• WAREHOUSE_METERING_HISTORY
VIEWINGCOMPUTE USAGE
● Virtual Warehouses
© 2019 Snowflake Inc. All Rights Reserved 16
RESOURCEMONITORS
• Align with team-by-team warehouse separation for granular cost governance
• Set at account level if team-by-team quotas are not needed
• Leverage tiered triggers with escalating actions (e.g., Notify > Notify > Suspend)
• Enable notifications using ACCOUNTADMIN role and set e-mail address
© 2019 Snowflake Inc. All Rights Reserved
STORAGE FUNDAMENTALS
17
© 2019 Snowflake Inc. All Rights Reserved 18
TIME TRAVELSTORAGE
• High churn detected with ratio such as:
TIME_TRAVEL_BYTES / ACTIVE_BYTES
from TABLE_STORAGE_METRICS view
• For Enterprise (or higher), retention period can be up to 90 days; verify retention period on all large or high-churn tables
• Reduce retention period if data can be regenerated/reloaded and time/effort to do so is within acceptable boundaries/SLAs
• Use periodic zero-copy-cloning (snapshots) instead of time travel to provide longer retention period at discrete points in time (daily, weekly, etc)
Areas Of Focus• Dimensional Tables• Persistent Staging Areas• Materialized Relationships,
Derivations, Other Business Rules
© 2019 Snowflake Inc. All Rights Reserved 19
FAIL-SAFESTORAGE
• Permanent tables follow full CDP lifecycle; temp/transient tables NEVER use fail-safe
• Utilize temp tables for session-specific intermediate results in complex data processing workflow
• Temporary tables are dropped (and storage released) as soon as session ends
• Utilize transient tables for staging where frequent truncate/reload operations occur
• Consider designating databases/schemas as transient to simplify table creation
Areas Of Focus• Staging Tables• Intermediate Result Tables• Work Areas for Developers, Analysts
& Data Scientists• Reporting Tool Materialized Results
© 2019 Snowflake Inc. All Rights Reserved 20
• Web UI• Billing & Usage page (under Account)• Tables (under Databases)
• SHOW TABLES / MATERIALIZED VIEWS• INFORMATION_SCHEMA views
• TABLES• TABLE_STORAGE_METRICS
• INFORMATION_SCHEMA table function• STAGE_STORAGE_USAGE_HISTORY
for daily storage by internal stage• ACCOUNT USAGE share views
• TABLE_STORAGE_METRICS for active, time travel and fail-safe storage
• STAGE_STORAGE_USAGE_HISTORY for daily storage by internal stage
VIEWINGSTORAGE USAGE
● Tables
○ Active/Current Storage○ Time Travel Storage
○ Fail-Safe Storage● Materialized Views● Internal Stage
© 2019 Snowflake Inc. All Rights Reserved 21
• Web UI• Billing & Usage page (under Account)• Special warehouse entry per service:
■ AUTOMATIC_CLUSTERING
■ MATERIALIZED_VIEW_MAINTENANCE
■ SNOWPIPE
• INFORMATION_SCHEMA table function• AUTOMATIC_CLUSTERING_HISTORY
• MATERIALIZED_VIEW_REFRESH_HISTORY
• PIPE_USAGE_HISTORY for daily storage by internal stage
• ACCOUNT USAGE share views• PIPE_USAGE_HISTORY
● Automatic Clustering
● Materialized Views
● Snowpipe
VIEWINGSERVICES USAGE
© 2019 Snowflake Inc. All Rights Reserved 22
● Data exits cloud provider region○ To another region within the
same cloud provider
○ To different cloud provider
● Data Export via COPY INTO
● Data Replication (in preview)
VIEWINGDATA EGRESS
• Web UI• Billing & Usage page (under Account)
• INFORMATION_SCHEMA table function• DATA_TRANSFER_HISTORY table
function for data transfer events across an entire account
• ACCOUNT USAGE share views• DATA_TRANSFER_HISTORY
• For customers under capacity contracts, this is a pass-through charge; on-demand customers pay a small markup for egress charges.
© 2019 Snowflake Inc. All Rights Reserved
BUSINESS UNITCHARGEBACKS
© 2019 Snowflake Inc. All Rights Reserved 24
BUSINESS UNIT CHARGEBACKS
Agenda• Designing for Cost Allocations
• Snowflake Shared Database
• Allocating Chargebacks
Considerations• Business Units Supported
• Teams Incurring Costs
• Granularity of Chargebacks
© 2019 Snowflake Inc. All Rights Reserved
DESIGNING FOR CHARGEBACKSConceptual layer defined with naming conventions and governed with RBAC
Account
BusinessUnit 1
Virtual Warehouses
Databases Schemas
Tables
Permanent
Temp/Transient
AutomaticClustering
ServiceMaterialized
Views
Stages
Internal
Cross-RegionExtract Egress
Pipes
BusinessUnit 2
Virtual Warehouses
Databases Schemas
Tables
Permanent
Temp/Transient
AutomaticClustering
ServiceMaterialized
Views
Stages
Internal
Cross-RegionExtract Egress
Pipes
Business Unit 1
Business Unit 1
Data Science Virtual Warehouses
25
© 2019 Snowflake Inc. All Rights Reserved 26
SNOWFLAKE SHARED DATABASE
• ACCOUNT_USAGE Schema• READER_ACCOUNT_USAGE
Schema
ACCOUNT_USAGE• Warehouse, Storage, Transfer, and
most Information Schema views • Includes records for dropped objects• Retention time of 1 year• Data latency of 45 min to 3 hours
READER_ACCOUNT_USAGE• Similar views for Reader Account
usage (Warehouse, Query History, Load History)
© 2019 Snowflake Inc. All Rights Reserved 27
ALLOCATING CHARGEBACKS
• Separate compute and storage resources between each relevant business unit or cost center
• Use well defined naming conventions to name warehouses and databases according to the owning business units
• Govern resource use with role based access control (RBAC)
• Use the SNOWFLAKE shared database to develop custom reporting to automate tracking
• Business Units & Cost Centers
• Warehouses & Databases
• RBAC• Reporting
© 2019 Snowflake Inc. All Rights Reserved
SNOWFLAKE PROFESSIONAL SERVICES
© 2019 Snowflake Inc. All Rights Reserved
Reveal additional use cases for modern data analytics & data
sharing for even greater benefits
Identify New Use CasesShorten Time to Value
Achieve project outcomes faster and deliver data-driven insights and ROI
sooner than you expected
Efficient Consumption
Guidance and knowledge transferto help utilize Snowflake
fully and efficiently
WHY ENGAGE WITHSNOWFLAKE PROFESSIONAL SERVICES
29
Best Practices
Migration Readiness
Package Offerings:
Role Based Security
Snowflake 360
Custom Packages
Technical Account Manager
© 2019 Snowflake Inc. All Rights Reserved
Technical Resources Learn all the content and ways to
get help. Find from blogs and
articles to ideas and
announcements.
LODGE COMMUNITY
PROFESSIONAL SERVICES
VISIT THE SERVICE HUBs!
Learn About Best
PracticesCome learn the tips our team has
identified across our customer
base.
CUSTOMER SUCCESS
Optimize SnowflakeLearn about our available service
offerings and how we can help
optimize your Snowflake
implementation.
CUSTOMER SUPPORT
Discuss Issues You’ve
EncounteredChat with Support Engineers, live,
about issues you’re having and get
advice on potential resolutions.
Provide FeedbackAlready a community member? Tell
us what is working and what is not
in the Lodge.
Proactive Support We are using Snowflake on
Snowflake to get proactive about
helping customers solve problems,
before they become bigger issues.
Learn From Other
Customers Share with us your use case and
learn what others are doing with
similar needs.
Tailored SolutionsAlready engaged with a partner?
Let’s work together. Experiencing
problems or just not sure where to
start? We can design a solution to
help.
© 2019 Snowflake Inc. All Rights Reserved
Questions?
© 2019 Snowflake Inc. All Rights Reserved
Thank You