cloud as a data platform

30
Cloud as a Data Platform What is (Big) Data? Amazon Data Services

Upload: andrei-savu

Post on 06-May-2015

648 views

Category:

Technology


1 download

DESCRIPTION

My slides on how to use cloud as a data platform at BigDataWeek 2013 Romania http://www.eurocloud.ro/en/events/all-there-is-to-know-about-big-data/#.UXZFaUDvlVI

TRANSCRIPT

Page 1: Cloud as a Data Platform

Cloud as a Data PlatformWhat is (Big) Data? Amazon Data Services

Page 2: Cloud as a Data Platform

Andrei SavuFounder of Axemblr.comCo-organizer of Bucharest JUGLead of Apache Provisionr

Passion for Automation & Data Analysis

Connect with me on LinkedIn

Page 3: Cloud as a Data Platform

@ AxemblrData Processing InfrastructureDeployment Automation on IaaS platforms

Product: Hadoop On-Demand ApplianceApache Provisionr (Open Source)Consulting & Professional Services

Page 4: Cloud as a Data Platform

TopicsIntroduction on (Big)Data

● Characteristics● In Practice● Value

Amazon Data Platform● Tools● How they fit

Page 5: Cloud as a Data Platform

What is (Big)Data?Beyond the Hype (Source)

Page 6: Cloud as a Data Platform

... size & speed are relative

Page 7: Cloud as a Data Platform

Characteristics #1Too big, Too fast, Unstructured

Page 8: Cloud as a Data Platform

1. Volume"Simple models work better with more data"

The Unreasonable Effectiveness of DataAlon Halevy, Peter Norvig, and Fernando Pereira, Google

Challenging from a technical perspectiveNeeds scalable storageDistributed query engines (massively parallel)

Page 9: Cloud as a Data Platform

2. VelocityNothing new for financial traders

Tight feedback loop as competitive advantageComplex event processing (CEPs)Online stream summarization (estimation)Online aggregation (key-value stores)Long term storage for batch processing

Page 10: Cloud as a Data Platform

3. VarietyThe reality of data is messy and the format evolves over time

Entity Resolution, Language Detection etc. Mantra: Detect Schema, Annotate, Enrich

Page 11: Cloud as a Data Platform

Characteristics #2In Practice

Page 12: Cloud as a Data Platform

(Big) data is messy80% efforts go into identifying sources, integration and cleaning

Messy and disconnected: different systems, different networks, different departments

Consider data-markets

Page 13: Cloud as a Data Platform

(Big) data has gravityTends to attract processing services

The cost of moving may be large

Page 14: Cloud as a Data Platform

Cloud or in-house?Cloud:

● for development & exploration● low usage or variable capacity needs

In-house:

● due to strict regulations● for performance and cost efficiency

Page 15: Cloud as a Data Platform

People & Data ScienceYou need a team that combines: math, programming and scientific instinct

Building data-science teamshttp://radar.oreilly.com/2011/09/building-data-science-teams.html

Page 16: Cloud as a Data Platform

(Big)Data Value

Page 17: Cloud as a Data Platform

... answer them w/ Data

Page 18: Cloud as a Data Platform

Enables New ProductsRecommendation engines (think Amazon, Netflix, Facebook, LinkedIn)

Advanced advertising (more later)

Advanced search & spelling suggestions

(and many more)

Page 19: Cloud as a Data Platform

Rule of thumb

"Advice to businesses starting out with big data: first, decide what problem you want to solve." *

Christer Johnson, IBM’s leader for advanced analytics in North America

* create data-driven business processes (more)

Page 20: Cloud as a Data Platform

(Big)Data on AWShttp://aws.amazon.com/big-data/

Page 21: Cloud as a Data Platform

Based on my work atMagnolia Labs Inc. http://magnolialabs.com/

San Francisco, CA based company with R&D in Romania

Various products: RTB (real-time bidding), Secure Browsing etc.

They are hiring! [email protected]

Page 22: Cloud as a Data Platform

Overview

Page 23: Cloud as a Data Platform

Amazon S3

Amazon S3

Page 24: Cloud as a Data Platform

Amazon Glacier

Amazon Glacier

Page 25: Cloud as a Data Platform

Amazon EMR (Elastic MapReduce)

Page 26: Cloud as a Data Platform

Amazon Data Pipeline

Page 27: Cloud as a Data Platform

Amazon RedShift

Page 28: Cloud as a Data Platform

Amazon DynamoDB

Page 29: Cloud as a Data Platform

How they fit?

Page 30: Cloud as a Data Platform

Thanks! Questions? Andrei Savu - asavu @ axemblr.con