snowflake brief overview
TRANSCRIPT
2
Current realities
Complex Data Infrastructure
Complex systems, data pipelines, data silos
EDW Datamarts
Hadoop / noSQL
Data Diversity ChallengesExternal data, multi-structured data, machine-generated data
Barriers to AnalysisAnalysis limited by incomplete
data, delays in access, performance limitations
3
Our vision: Reinvent the data warehouse
Data warehousing for everyone
Data warehouse performance &
enterprise capabilities
Cloud elasticity & agility
Big data flexibility & scalability
4
Our product: The Snowflake Elastic Data
Warehouse
All-new SQL data warehouse
No legacy code or constraints
Delivered as a serviceNo infrastructure, knobs
or tuning to manage
Designed for the cloud
Running in Amazon Web Services
5
A team of data expertsExpert team• Experts in databases and data
processing from leading companies• >100 years collective
experience building databases• >120 patents
Leading investors
Bob Muglia, CEOFormer President of Microsoft’s Server and Tools Business
Benoit Dageville, Founder & CTOLead architect of Oracle parallel execution and a key manageability architect
Marcin Zukowski, Founder & VP of EngineeringInventor of vectorized query execution in databases
Thierry Cruanes, Founder ArchitectLeading expert in query optimization and parallel execution at Oracle
6
Our value proposition
Bring together diverse data and workloads in one
system
Simplify and accelerate path from
data to analytics
Remove the cost and complexity of conventional
solutions
7
A new architecture: Multi-cluster, shared data
• Standard interfaces• Cloud services layer
coordinates across service• Independent compute
clusters access data• Data centralized in
enterprise-class cloud storage
8
Scale using multi-dimensional elasticity
• Elastic scaling for storageLow-cost cloud storage, fully replicated and resilient
• Elastic scaling for computeVirtual warehouses scale up & down on the fly to support workload needs
• Elastic scaling for concurrencyScale concurrency using independent virtual warehouses
Data Science
Reporting
Marketing
Loading / ETL
Test
Development
9
Bringing together structured & semi-structured data
> SELECT … FROM …
Semi-structured data(e.g. JSON, Avro,
XML)
Structured data (e.g. CSV, TSV, …)
Direct ingestionLoad in raw form (e.g.
JSON, Avro, XML)Optimized storageOptimized data type,
no fixed schema or transformation required
Optimized SQL queryingFull benefit of database optimizations (pruning,
filtering, …)
10
Data warehouse as a service
Hardware infrastructure
Software infrastructure
Data modeling
Data analysisCustomer
Inde
x m
anag
emen
t
Data
pa
rtitio
ning
Met
adat
a up
date
s
Data
pr
otec
tion
Avai
labi
lity
& DR
Secu
rity
impl
emen
tati
on
Quer
y op
timiza
tion
11
No infrastructure, knobs, or tuning
Infrastructure management
Virtual hardware and software managed by
Snowflake
Metadata management
Automatic statistics collection, scaling, and
redundancy
**..**..
Manual query optimization
Dynamic optimization, parallelization, and
concurrency management
Data storage management
Adaptive data distribution, automatic
compression, automatic optimization
12
Customers“Snowflake is faster, more flexible, and more scalable than the alternatives on the market. The fact that we don’t need to do any configuration or tuning is great because we can focus on analyzing data instead of on managing and tuning a data warehouse.”
Craig Lancaster, CTO
13
Customer results
Gaming companyReplace noSQL data store with Snowflake for storing & transforming event data
Snowflake: 1.5 minutes
noSQL data store: 8 hours
Snowflake: 26 minutes
Data warehouse appliance: 7 hours
Market research companyReplace on-premises data warehouse with Snowflake for analytics workload
TelcoImproved performance while adding new workloads at a fraction of the cost
Snowflake: added 2 new workloads for $50K
Data warehouse appliance: $5M + to expand
14
Customer example
Before• Fragile data pipeline• Delays in getting updated data• High cost and complexity• Limited data granularity
After• >50x faster data updates• Reduced costs by >50%• Nearly eliminated pipeline failures• Able to retain full data granularity