an introduction to cloud computing with amazon web services and mongodb

Post on 16-Apr-2017

102 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An introduction to cloud computing withAmazon Web Services

andMongoDB

Samuel DemharterDTC, 10 March 2016

Cloud Computing“Everybody's in it and nobody's in it. It's like a cloud that everybody has given a little puff of mist to, and then the cloud does all the heavy thinking for everybody. I don't mean there's really a cloud. I just mean it's something like that.”

The Sirens of Titan, Kurt Vonnegut, 1959

Definition• Gartner Group: “A style of computing

in which massively scalable scalable and elastic IT-enabled capabilities are delivered as a service using Internet technologies.”

Cloud Computing Service Models

Software As A Service (SAAS)

Platform As A Service (PAAS)

Infrastructure As A Service (IAAS)

Amazon Web Services• Development started in 2002• In 2006, Amazon launched its Elastic

Compute cloud (EC2) and S3 storage service

• Amazon EC2/S3 was the first widely accessible cloud computing infrastructure service

Amazon Web Services (AWS)

AWS

Computing

EC2

MapReduce

Storage

S3

EBS

Databases

SimpleDB

DynamoDB

Others

Others

AWS Computing• Elastic Compute Cloud (EC2)– Access to individual instances as you

would with any other machine– Customisable configuration– Auto Scaling

• Amazon Elastic MapReduce– Process vast amounts of data– Utilise Hadoop framework

AWS Storage• Simple Storage Service (S3)

– Scalable cloud storage– HTTP access– Object store not a file system– Cheap

• Elastic Block Storage (EBS)– Local storage– For use with EC2 instances– Take snapshot backups– Fast

AWS Databases• Amazon SimpleDB (noSQL)– Ease of administration

• Amazon DynamoDB (noSQL)– Scalability & durability

• Amazon Relational Database Service (SQL)– Efficient indexing & querying

• Amazone ElastiCache– Fast data access

huMONGOus – scalable– natural

What is a database?

A database is a collection of information that is organized so that it can easily be accessed, managed, and updated.

Why use a database?• Reusability : You need a single, public,

interface for your data storage that all parts of your application can use.

• Availability : You need be sure that your application will always be able to read and write data.

• Durability : You need to be sure that your data will stick around.

• Scalability : You need your data storage to be able to grow with your application.

Typical SQL and noSQL databases

SQLOracle

MySQLMicrosoft

SQL

NoSQLKey-ValueColumn

DocumentGraph-based

SQL – Structured Query LanguageNoSQL – Not Only SQL

MongoDBCouchDB

Riak

SQL vs MongoDB

http://sql-vs-nosql.blogspot.co.uk

MongoDB• Distributed• Document-oriented• Schema-less storage solution• Uses JSON-style documents • Supports Python, PHP, Java, Ruby, C++,

etc.• Replica sets for failovers and speeding up

reads• Sharding for high performance

SQL vs MongoDB (noSQL)SQL MongoDB (noSQL)

Requires structured data/ well-designed schema

semi-structured, unstructured & polymorphic data

Table based Document based

Database atomicity Document atomicity/eventual consistency

Rules enforced by database Rules enforced by user

Scale-up Scale-out (suitable for distributed computing)

Flexible & fast

Table - Who is the account holder for account ID 3?

Document - Who is the account holder for account ID 3?

Redundancy and Data Availability - Replication

Scaling out - Sharding• A means for partitioning data across

servers for high performance

Real-time Analytics

Usage Example 1: DNA Sequencing

• Real-time DNA sequencing

• Raw Data

PC• Basecallin

g

AWS• Basecalled

Data

PC

Usage Example 1: DNA Sequencing

• Use AWS EC2 computing and S3 storage

• Spot market – auction of unused EC2 instances

• Pay-Per-Use an important economical factor for Nanopore

• Use a combination of MongoDB and SQL

Usage Example 2: Genome Analysis Genetic Variant Calling

Peter White et al., Ohio State University in collaboration with Genome Nexthttps://youtu.be/upAtK_SOtsY

Definitions• Instance: A copy of an Amazon

Machine Image running as a virtual server in the AWS cloud

• Instance type: A specification that defines the memory, CPU, storage capacity, and hourly cost for an instance.

• Amazon Machine Image: AMIs are like a template of a computer's root drive.

• Pixar accidentally wipes out nearly every file of "Toy Story 2" about 10 months into production. Fortunately, supervising technical director Galyn Susman had just become a new mom and had an entire copy of the movie on her home computer so that she could work from home. Woody and Buzz live to see another day, and movie.

top related