Download - Cloud as a Data Platform
Cloud as a Data PlatformWhat is (Big) Data? Amazon Data Services
Andrei SavuFounder of Axemblr.comCo-organizer of Bucharest JUGLead of Apache Provisionr
Passion for Automation & Data Analysis
Connect with me on LinkedIn
@ AxemblrData Processing InfrastructureDeployment Automation on IaaS platforms
Product: Hadoop On-Demand ApplianceApache Provisionr (Open Source)Consulting & Professional Services
TopicsIntroduction on (Big)Data
● Characteristics● In Practice● Value
Amazon Data Platform● Tools● How they fit
What is (Big)Data?Beyond the Hype (Source)
... size & speed are relative
Characteristics #1Too big, Too fast, Unstructured
1. Volume"Simple models work better with more data"
The Unreasonable Effectiveness of DataAlon Halevy, Peter Norvig, and Fernando Pereira, Google
Challenging from a technical perspectiveNeeds scalable storageDistributed query engines (massively parallel)
2. VelocityNothing new for financial traders
Tight feedback loop as competitive advantageComplex event processing (CEPs)Online stream summarization (estimation)Online aggregation (key-value stores)Long term storage for batch processing
3. VarietyThe reality of data is messy and the format evolves over time
Entity Resolution, Language Detection etc. Mantra: Detect Schema, Annotate, Enrich
Characteristics #2In Practice
(Big) data is messy80% efforts go into identifying sources, integration and cleaning
Messy and disconnected: different systems, different networks, different departments
Consider data-markets
(Big) data has gravityTends to attract processing services
The cost of moving may be large
Cloud or in-house?Cloud:
● for development & exploration● low usage or variable capacity needs
In-house:
● due to strict regulations● for performance and cost efficiency
People & Data ScienceYou need a team that combines: math, programming and scientific instinct
Building data-science teamshttp://radar.oreilly.com/2011/09/building-data-science-teams.html
(Big)Data Value
... answer them w/ Data
Enables New ProductsRecommendation engines (think Amazon, Netflix, Facebook, LinkedIn)
Advanced advertising (more later)
Advanced search & spelling suggestions
(and many more)
Rule of thumb
"Advice to businesses starting out with big data: first, decide what problem you want to solve." *
Christer Johnson, IBM’s leader for advanced analytics in North America
* create data-driven business processes (more)
(Big)Data on AWShttp://aws.amazon.com/big-data/
Based on my work atMagnolia Labs Inc. http://magnolialabs.com/
San Francisco, CA based company with R&D in Romania
Various products: RTB (real-time bidding), Secure Browsing etc.
They are hiring! [email protected]
Overview
Amazon EMR (Elastic MapReduce)
Amazon Data Pipeline
Amazon RedShift
Amazon DynamoDB
How they fit?
Thanks! Questions? Andrei Savu - asavu @ axemblr.con