an introduction to cloud computing with amazon web services and mongodb
TRANSCRIPT
An introduction to cloud computing withAmazon Web Services
andMongoDB
Samuel DemharterDTC, 10 March 2016
Cloud Computing“Everybody's in it and nobody's in it. It's like a cloud that everybody has given a little puff of mist to, and then the cloud does all the heavy thinking for everybody. I don't mean there's really a cloud. I just mean it's something like that.”
The Sirens of Titan, Kurt Vonnegut, 1959
Definition• Gartner Group: “A style of computing
in which massively scalable scalable and elastic IT-enabled capabilities are delivered as a service using Internet technologies.”
Cloud Computing Service Models
Software As A Service (SAAS)
Platform As A Service (PAAS)
Infrastructure As A Service (IAAS)
Amazon Web Services• Development started in 2002• In 2006, Amazon launched its Elastic
Compute cloud (EC2) and S3 storage service
• Amazon EC2/S3 was the first widely accessible cloud computing infrastructure service
Amazon Web Services (AWS)
AWS
Computing
EC2
MapReduce
Storage
S3
EBS
Databases
SimpleDB
DynamoDB
Others
Others
AWS Computing• Elastic Compute Cloud (EC2)– Access to individual instances as you
would with any other machine– Customisable configuration– Auto Scaling
• Amazon Elastic MapReduce– Process vast amounts of data– Utilise Hadoop framework
AWS Storage• Simple Storage Service (S3)
– Scalable cloud storage– HTTP access– Object store not a file system– Cheap
• Elastic Block Storage (EBS)– Local storage– For use with EC2 instances– Take snapshot backups– Fast
AWS Databases• Amazon SimpleDB (noSQL)– Ease of administration
• Amazon DynamoDB (noSQL)– Scalability & durability
• Amazon Relational Database Service (SQL)– Efficient indexing & querying
• Amazone ElastiCache– Fast data access
huMONGOus – scalable– natural
What is a database?
A database is a collection of information that is organized so that it can easily be accessed, managed, and updated.
Why use a database?• Reusability : You need a single, public,
interface for your data storage that all parts of your application can use.
• Availability : You need be sure that your application will always be able to read and write data.
• Durability : You need to be sure that your data will stick around.
• Scalability : You need your data storage to be able to grow with your application.
Typical SQL and noSQL databases
SQLOracle
MySQLMicrosoft
SQL
NoSQLKey-ValueColumn
DocumentGraph-based
SQL – Structured Query LanguageNoSQL – Not Only SQL
MongoDBCouchDB
Riak
SQL vs MongoDB
http://sql-vs-nosql.blogspot.co.uk
MongoDB• Distributed• Document-oriented• Schema-less storage solution• Uses JSON-style documents • Supports Python, PHP, Java, Ruby, C++,
etc.• Replica sets for failovers and speeding up
reads• Sharding for high performance
SQL vs MongoDB (noSQL)SQL MongoDB (noSQL)
Requires structured data/ well-designed schema
semi-structured, unstructured & polymorphic data
Table based Document based
Database atomicity Document atomicity/eventual consistency
Rules enforced by database Rules enforced by user
Scale-up Scale-out (suitable for distributed computing)
Flexible & fast
Table - Who is the account holder for account ID 3?
Document - Who is the account holder for account ID 3?
Redundancy and Data Availability - Replication
Scaling out - Sharding• A means for partitioning data across
servers for high performance
Real-time Analytics
Usage Example 1: DNA Sequencing
• Real-time DNA sequencing
• Raw Data
PC• Basecallin
g
AWS• Basecalled
Data
PC
Usage Example 1: DNA Sequencing
• Use AWS EC2 computing and S3 storage
• Spot market – auction of unused EC2 instances
• Pay-Per-Use an important economical factor for Nanopore
• Use a combination of MongoDB and SQL
Usage Example 2: Genome Analysis Genetic Variant Calling
Peter White et al., Ohio State University in collaboration with Genome Nexthttps://youtu.be/upAtK_SOtsY
Resources• AWS Tutorials - https://qwiklabs.com• MapReduce - http
://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
• AWS for Research - https://aws.amazon.com/grants/
• MongoDB - http://university.mongodb.com/
Definitions• Instance: A copy of an Amazon
Machine Image running as a virtual server in the AWS cloud
• Instance type: A specification that defines the memory, CPU, storage capacity, and hourly cost for an instance.
• Amazon Machine Image: AMIs are like a template of a computer's root drive.
• Pixar accidentally wipes out nearly every file of "Toy Story 2" about 10 months into production. Fortunately, supervising technical director Galyn Susman had just become a new mom and had an entire copy of the movie on her home computer so that she could work from home. Woody and Buzz live to see another day, and movie.