matt mcilwain opening keynote
TRANSCRIPT
FROM “BIG DATA” TO DATAWARE
SIM Technology Leadership SummitMay 20, 2015
MADRONA OVERVIEW
• Madrona is a leading venture capital firm focused on sourcing and growing early-stage technology companies in the Pacific Northwest
• About $1 billion under management across five funds–Investors include the University of Washington, University of Virginia, Irvine Foundation, University of North Carolina, and strategic individuals
• Investments made in over 100 companies the past 20 years with over 50 active portfolio companies and over 40 positive exits
• Madrona team–7 Managing Directors–Strategic Directors and Venture Partners include: Sujal Patel, Steve Singh, John McAdam, Prof. Oren Etzioni, and Prof. Dan Weld
THE PNW TECH ECOSYSTEM IS STRONG AND GROWING
Anchor Tenants
Large Tech Satellite Offices
Mid-Cap Tech with Seattle HQ
World-Class Research
OUR FUTURE
1995 TODAY (2015) 2035
COMMUNICATION Snail mail, fax, early emailSMS, Facebook, Skype, Snapchat & Twitter
Virtual Reality Rooms
DEVICES Desktop PCs Smart Mobile DevicesEmbedded on you & everything else (IoT)
SOFTWARE/DATAWARE
Packaged/Licensed SaaS subscription/Apps Intelligent apps
INTERNET/CONNECTIVITY
Dial up modem 56k“Ubiquitous” broadband 100 Mbps to mobile
“Always On” and IoT
COMPUTE/STORAGE
Pentium processor 100 MIPSSingle-core ~$1 million/TB
Intel Xeon E7 processor – 4000 MIPSMulti-core $59/TB
$5/Petabyte
INFRASTRUCTURE Internet & Dedicated servers Cloud Real-time hybrid marketplace
COMMERCE 1 book/10 days/$5 deliveryAnything 2 days free; 50,000 items in 2 hours free delivery
Drones or autonomous car delivery & 3D printed
WHAT IS “DATAWARE”?A framework for describing the combination of data, software, math formulas and “predictive” analytics that help data savvy teams turn information and insights into profitable actions.
5
Why Now?• Cloud Enablement: “Cloud” abstracts hardware into software and
enables unprecedented elasticity, scale and speed
• Big Data: The volume, velocity and variety of data types and stores has expanded rapidly while the value of retaining/leveraging data often exceeds the cost
• Legacy “Datastores”: Highly structured and constrained systems (databases, data warehouses, BI tools) that are too rigid to unlock data’s full value yet too ubiquitous and important to NOT leverage
• Emerging Solutions: A combination of point solutions, systematic approaches and “vertical” services emerging to leverage these trends in an agile manner. These solutions require a structured framework to prioritize market opportunities
INSERT BIG DATA LANDSCAPE SLIDE
6
MADRONA DATAWARE FRAMEWORK
7
INTELLIGENT APPS & SERVICES
DATA INTELLIGENCE
ENABLING INFRASTRUCTURE
Ag
ile D
ata
Sta
ck
Marc Benihoff, Founder and CEO of Salesforce.com, when asked what hethinks is the major tech trend of the next five years responded that we arein an “AI Spring.” Fortune Term Sheet 1/6/15
WHAT MAKES THE DATA “BIG”?
Value More valuable to store than throw away
8
Variety Different sources & structures create opportunities… & challenges
Volume Easy, plentiful & cheap data to collect & store
Velocity Speed of turning data into actionable insights – batch vs. real-time!
DATA INPUTS
• Legacy Databases: Highly structured, transactional focused, generally rigid
– Databases with SQL queries (OLTP)
– Historic “Extract, Transform, Load” tools (ETL)
– Data warehouses and data cubes
– Business Intelligence (BI) and “Online Analytics Processing (OLAP)”
• “Big Data” Sources: Structure variety, high volume/velocity, agile
– “Not Only SQL” (NoSQL) data repositories
– Allow for “Extract, Load, Transform” (ELT) flexibility
– Continuous, online (streamed) data flows
– Relationship focus vs. Relational focus9
Places Things
Profiles
WHERE DOES DATA & METADATA COME FROM?
People
• Consumers• Office Workers• Field Workers• Citizens• Partners• Customers
• Home• Work• Stores• Destinations• Routes
• Individuals• Demographics• Devices• Locations• Objects• “Campaigns”• Biology• “Networks"
• Devices• Vehicles• Machines• Medical• Homes• Content
WHY DOES IT MATTER?
From To
Structure Mostly structured (relational)
Flexibly structured (relationship)
Flexibility Rigid & slow(R + cubes +BI)
Agile & rapid(Python + graphs/ML + UI)
Availability Offline & batch Online & continuous
Key Drivers Code & “Rules”(“hard coded”, structured learning)
Data, Statistics, Discovery(“machine learned”, “inferred”, Bayesian)
Conceptually Certainty & consistency
Iteration & “surprise”
11
TECHNOLOGY SECTOR IMPACT OF “DATAWARE”
YEARS: 0 – 2 2 – 5 5+
Relational Databases (Oracle, MSFT) + ?? -
Traditional Infrastructure(HP, IBM, Dell)
+ - --
Traditional Apps(Oracle, SAP) + +/- -
Cloud Infrastructure ++ ++ +
SAAS ++ ++ +/-
12
BIG COMPANY “LEADING INDICATORS”
• Microsoft-AzureML, Revolution Analytics, much more
• HP reorganizes software business around “Big Data”
• Salesforce.com buys RelateIQ for $390M for “data cloud”
• Oracle builds “data cloud” team including Blue Kai and Datalogix
• SAP promotes HANA, buys Concur
• IBM advertises Watson, Blue Mix
• AWS – AmazonML, Lambda, Kinesis13
KEY QUESTIONS
• How do big, especially software-driven, companies unlock their “data silos”?
• How will traditional databases/warehouses, newer “big data” stores and integrated big data “lakes” compliment or compete?
• What models will emerge to capture value in “data intelligence”?
• To what extent can intelligent apps and services disrupt legacy apps/services?
14
MADRONA DATAWARE FRAMEWORK
15
INTELLIGENT APPS & SERVICES
DATA INTELLIGENCE
ENABLING INFRASTRUCTURE
Ag
ile D
ata
Sta
ck
KEYS TO EMBRACING DATAWARE
1. Enabling infrastructure complex (Hadoop/Cloudera, NoSQL/MongoDB, Spark, Legacy) & hard/expensive but getting simplified and cheaper
2. Data Intelligence holds big promise but scarcity of “data scientists” requires professional services (Dato, Context Relevant, Atigeo, Palantir) and systematic, standardized approaches from emerging companies
3. Early “App Intelligence” that is real-time and agile already exists (ad serving, content recommendations, personalization, vertical markets). Tremendous opportunity here to reinvent categories
4. Opportunities also exist in the data pipeline (Trifacta) and data management, but tend to be deeper technical systems
16
APPLICATION INTELLIGENCE
1. What will an “application” look like in 5+ years?
2. What will make that application “intelligent”?
17
=
+
+
Apps
Algos
Data
App Intelligence
MADRONA DATAWARE INVESTMENTS
18
INTELLIGENTAPPS &
SERVICES
DATA INTELLIGENCE
ENABLING INFRASTRUCTURE
AG
ILE
DA
TA S
TAC
K
YIELDEX
DATO
BOOMERANG
JOBALINE HIGHSPOTBIZIBLE
PLACED
MAXPOINT
APPTIO
SEEQ
QUMULO
CONTEXT RELEVANT
ALGORITHMIA
IGNEOUS
ICEBRG
EXTRAHOP
Fund III Fund IV Fund V
Appendix
19
Dataware Case Study: Apptio
20
Category: “Full Stack”
Focus: Data-driven enterprise SAAS for CIO & team to run the business of IT (TBM)
Revenue: $100M+
Lineage: Startups, HP, IBM/rational
Keys: • Combine legacy General Ledger & modern usage data to “cost” services and share with users
• Define industry data & metadata standard – ATUM• Deliver real-time enterprise SAAS solution
Investors: Madrona Venture Group, Greylock Partners, Shasta Ventures, Andreessen Horowitz, T. Rowe Price
Dataware Case Study: Cloudera
21
Category: Enabling Infrastructure
Focus: Became the industry standard for extracting, storing and managing a variety of data types so that they can enable data intelligence and data-driven services to suceed
Revenue: $100M+
Lineage: Hadoop, Open Source, Google, UW
Keys: • Early player in being a diverse, indexed data store• Helped define the “file system”, called HDFS, for
managing large-scale data stores• Attempting to be the underlying platform for
dataware
Investors: Accel Partners, Greylock Partners, Intel, T. Rowe Price
Dataware Case Study: Dato
22
Category: Data Intelligence
Focus: Leverage machine learning and various data types from inspiration to insight and to build scalable, predictive and recommendation systems
Revenue: < $10M
Lineage: UW, Carnegie Mellon
Keys: • Use S-frames to combine graph, table, text & image data types
• Build an “end to end” data intelligence system from prototype to production
• Deliver predictive and recommender systems as services or stand alone applications for business customers
Investors: Madrona Venture Group, NEA, Vulcan
Dataware Case Study: Placed.com
23
Category: App Intelligence
Focus: Combine location database & active panel data to analyze and optimize advertising and marketing programs
Revenue: < $10M
Lineage: Farecast, Quantcast, aQuantive
Keys: • Leverage data science to build highly accurate place database
• Create statistically significant panels to measure physical world impact of digital advertising
• Embed service into mobile add ecosystem to deliver actionable insights
Investors: Madrona Venture Group, Two Sigma
Dataware Case Study: Trifacta
24
Category: Continuous Data Pipeline
Focus: Automate the process of cleaning, normalizing and preparing data for “Data Intelligence” use cases
Revenue: Unknown
Lineage: Stanford (Jeff Herr), Cal (Joe Hellerstein)
Keys: • Focus on core “Data Wrangling” problem• Use machine learning to recognize patterns &
suggest automated fixes• Simple visualization/UI
Investors: Greylock Partners, Accel Partners, Ignition Partners