aws for the data professional

Post on 14-May-2015

4.751 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Core AWS services for the data professional - EC2, RDS, S3, Kinesis and more

TRANSCRIPT

Amazon Web Services for the SQL Server Professional

Lynn LangitArchitect

Level: Intermediate

What and Why AWS?

AWS Amazon’s

cloud

Set of services

Compute

Data

More

Market leader

In market longest

Usually cheapest

Most often used in

production

Amazon Web Services

EC 2- VMs for train, test & productionPricing• On-demand• Spot• Reserved

Demo - EC2• Virtual Machines

5

S3 and Glacier

About EC2 storage

S3 • 10 GB max• 3 copies• Usually for data storage

EBS – expand / snapshot, etc…• Can store AMIs (persistent)• Can ‘stop’ EC2 instances and

‘re-start’ – saves $$$• Costs more• Can expand• One copy only (faster)

SSD – optional• For high performance• Provisioned IOPs

Demo – S3 • File Storage

8

Demo – Glacier• Archival Storage

9

RDS – Managed Relational Data

Demo – RDS• SQL Server as a service

11

RDS vs. EC2 for SQL Server

• Provisioned IO – performance guarantees

• Scheduled backups • Point in time restores• Scheduled maintenance

windows• Full use of all SQL tools, SSMS,

Profiler, DTA, etc…• Supports Availability Groups

(requires 2012 Enterprise)

Why RDS costs more

Redshift – $999 / TB / year

Demo – Redshift• Data Warehousing as a Service

14

DynamoDB for fast NoSQL with SSDs

Demo – DynamoDB• NoSQL on SSD

16

Elastic MapReduce for easy Hadoop

Demo – MapReduce• Hadoop on AWS

18

Kinesis for real-time Big Data Streams

Demo – Kinesis• Real-time streaming for Big Data

20

Data Pipelines – automated data transfer

Demo – Data Pipeline• Build data flows on AWS

22

Integration w/ Visual Studio – AWS SDK

See Also:• AWS Tools for Windows

Developers• Includes AWS Powershell

AWS SDK includes AWS Powershell

Demo – AWS SDK• Add-in for Visual Studio and .NET

25

Cloud Database Services by VendorAWS Google Microsoft

RDBMS VMs EC2 AMIs w/SQL Server, etc… GCE w/MySQL Azure VM images w/SQL Server

Managed RDBMS RDS - SQL Server, MySQL Cloud SQL - MySQL SQL Azure

NoSQL buckets/databases

S3, EBS, Glacier, DynamoDB Cloud Storage HR Datastore on GAE

Azure Blobs & Tables

Pipelines Data Pipelines Data Pipelines (beta) SSIS?

Streaming Machine Learning

Kinesis orCustom EC2

BigQuery &Prediction API

StreamInsight Azure Machine Learning

Document MongoDB on EC2 MongoDB on GCE MongoDB on Windows Azure

Hadoop MapReduce Big Query (Dremel) HDInsight

Other Redshift – Data WarehouseWorkspaces & Zocalo

Managed VMsGAE

Azure Marketplace – premium data

Costs - Free Tier for Database Services

How much does it cost?

Tip: When testing use Billing Alerts to make sure you’ve turned off test services!

Creative Financing

• Use what you need and no more, i.e. instance size, storage size…• Watch for price drops – RDS price decrease this week

Regular Pricing

• Pause EC2 instances to reduce compute charges• Delete EC2 instances to reduce storage charges

Smart EC2 Instance Usage

• Set pricing alerts• Use spot pricing• Re-selling compute / storage

Vanity Pricing

Usage Summary

Compute

EC2

Dev & Test

Train

Prod

Storage

S3

Raw Storage

Glacier

Archiving

Data Services

RDS

Partially Managed RDBMS

HA SQL Server

Redshift

Data Warehousing

DynamoDB

fast NoSQL – on SSDs

EMR

On Demand MapReduce

Kinesis

Streaming

Data Pipelines

Automation

31

Keep Learning

• Connect– @LynnLangit– www.youtube.com/user/SoCalDevGal

• Get started– Sign up for AWS – use ‘Free Tier’ – Email me to get $100 AWS usage credit

top related