computing cloud - jordi torres · data available from tweeter and the method to interact with them,...

54
Cloud Computing AWS a practical example Mayo 2012 Hugo Pérez UPC

Upload: ngominh

Post on 25-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Cloud ComputingAWS a practical example

Mayo 2012 Hugo PérezUPC

Page 2: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

● Introduction ● Infraestructure ● Development and Results ● Conclusions

Index

- 2 -

Page 3: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

IntroductionIn order to know deeper about AWS services, mapreduce process, the public data available from tweeter and the method to interact with them, i developed a little example, using: AWS Infraestructure:- Elastic Cloud Compute EC2- Elastic Block Store EBS- Elastic IP- Simple Storage Service S3

AWS Tools:- Management Console- CloudWatch- Elastic MapReduce EMR

Tweeter Search API

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 3 -

Page 4: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Index

● Introduction ● Infraestructure ● Development and Results ● Conclusions

- 4 -

Page 5: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating AWS AccountGo to http://aws.amazon.com

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 5 -

Page 6: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating AWS AccountSign in as a new user

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 6 -

Page 7: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating AWS AccountRecord name, email and password

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 7 -

Page 8: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating AWS AccountRecord contact details

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 8 -

Page 9: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating AWS AccountRecord payment data

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 9 -

Page 10: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating AWS AccountConfirm a PIN by a phone call

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 10 -

Page 11: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating AWS AccountConfirming..

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 11 -

Page 12: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating AWS AccountWait some minutes until the account is active (less than 10 mins in this case)

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 12 -

Page 13: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating EC2Go to AWS Management Console-> EC2 Dashboard

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 13 -

Page 14: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating EC2Create a new instance

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 14 -

Page 15: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating EC2Choose the AMI (Amazon Machine Image) to install, Ubuntu Server 12.04

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 15 -

Page 16: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating EC2Defining number of instances and type, in this case 1 Micro, characteristics:HD: 8Gb (EBS), RAM: 600 Mb, CPU:Intel(R) Xeon(R) CPU E5430 @ 2.66GHz

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 16 -

Page 17: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating EC2Defining instance details, like shutdown behavior, user data.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 17 -

Page 18: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating EC2Defining tags: user-friendly names to manage the resources

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 18 -

Page 19: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating EC2Creating Key Pair to securely connect with the instance.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 19 -

Page 20: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating EC2Configuring the firewall

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 20 -

Page 21: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating EC2Review

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 21 -

Page 22: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating EC2You can check the details from the Management Console

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 22 -

Page 23: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating EC2Also you can monitor the instance, create alarms, configure detailed monitoring.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 23 -

Page 24: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating Elastic IPNow you can access to the instance by ssh using this name: ec2-23-23-187-119.compute-1.amazonaws.comTo simplify it, you can create a elastic ip address

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 24 -

Page 25: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating Elastic IPOnce created the elastic ip

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 25 -

Page 26: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating Elastic IPYou should associate it with the instance

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 26 -

Page 27: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating S3Defining the name and region, the region should be the same that EC2 to optimize for latency. AWS gives 5 Gb free.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 27 -

Page 28: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating S3Set permissions to grant access to list the S3 Bucket to Authenticated Users.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 28 -

Page 29: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating Billing AlarmFirst you have to enable this function.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 29 -

Page 30: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Creating Billing AlarmDefine the parameters: recipients and threshold

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 30 -

Page 31: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Cloud WatchBesides the alarm, you can check the estimated charges, through cloud watch

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 31 -

Page 32: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Cloud WatchThrought cloud watch you can query different kind of metrics

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 32 -

Page 33: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Index

● Introduction ● Infraestructure ● Development and Results ● Conclusions

- 33 -

Page 34: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Installing EMR CLIConnect to the serverssh -i awskey.pem [email protected] Install the Amazon Elastic MapReduce Ruby Client$ mkdir elastic-mapreduce-cli$ cd elastic-mapreduce-cli$ wget http://elasticmapreduce.s3.amazonaws.com/elastic-mapreduce-ruby.zip$ unzip elastic-mapreduce-ruby.zip

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 34 -

Page 35: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Installing EMR CLI

Configuring credentials$ vi credentials.json

{"access_id": "[Your AWS Access Key ID]","private_key": "[Your AWS Secret Access Key]","keypair": "[Your key pair name]","key-pair-file": "[The path and name of your PEM file]","log_uri": "[A path to a bucket you own on Amazon S3, such as, s3n://mylog-uri/]","region": "[The Region of your job flow, either us-east-1, us-west-2, us-west-1, eu-west-1, ap-northeast-1, ap-southeast-1, or sa-east-1]"}

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 35 -

Page 36: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Installing EMR CLIYou can get the AWS Access Key ID and the AWS Secret Access Key by entering to your account in http://aws.amazon.com in the Access Credentials section.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 36 -

Page 37: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Installing EMR CLIIt is recomended to create a new key pair for the exercise. I did it from Management Console, i put this key pair in the EC2 instance.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 37 -

Page 38: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Installing EMR CLII save all the parameters in the file: ubuntu@ip-10-195-195-175:~/elastic-mapreduce-cli$ more credentials.json{"access_id": "HPVAJFNULSZULY5NWHPV","private_key": "65xBzYVzV7THPVYWW2LcYN0roVwK1I+nxJ+BNHPV","keypair": "mapReduce","key-pair-file": "/home/ubuntu/mapReduce.pem","log_uri": "s3n://mylog-uri-hpv/","region": "us-east-1"}

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 38 -

Page 39: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Basics EMR CLIBasic commands of EMR CLI: $ ./elastic-mapreduce --help$ ./elastic-mapreduce --create$ ./elastic-mapreduce --list$ ./elastic-mapreduce --describe --jobFlow [JobFlowID]$ ./elastic-mapreduce -j JobFlowID --stream$ ./elastic-mapreduce --terminate JobFlowID

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 39 -

Page 40: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

MapperThe mapper script, the classic word counter:

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 40 -

#!/usr/bin/pythonimport sys import re

def main(argv): pattern = re.compile("[a-zA-Z][a-zA-Z0-9]*") for line in sys.stdin: for word in pattern.findall(line): print "LongValueSum:" + word.lower() + "\t" + "1"

if __name__ == "__main__": main(sys.argv)

Page 41: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Using Twitter API To generate the input data, run a simple query to twitter:

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 41 -

Page 42: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Using Twitter API Query:http://search.twitter.com/search.json?q=cloud%20computing&rpp=5&include_entities=true&result_type=mixed pattern: cloud computing rpp: return per page=5 include_entities: if it is true the result includes urls, media and hashtags result_type: - mixed: Include both popular and real time results in the response.- recent: return only the most recent results in the response- popular: return only the most popular results in the response.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 42 -

Page 43: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Using Twitter API Query:http://search.twitter.com/search.json?q=cloud%20computing&rpp=5&include_entities=true&result_type=mixed pattern: cloud computing rpp: return per page=5 include_entities: if it is true the result includes urls, media and hashtags result_type: - mixed: Include both popular and real time results in the response.- recent: return only the most recent results in the response- popular: return only the most popular results in the response.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 43 -

Page 44: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Using Twitter API Transfer the result to S3: $ s3curl.pl --id=personal --put=cloudcomputing -- http://s3.amazonaws.com/mylog-uri-hpv/entradas/cloudcomputing

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 44 -

Page 45: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Exec EMR $ ./elastic-mapreduce --create --stream --mapper s3://elasticmapreduce/samples/wordcount/wordSplitter.py --input s3://mylog-uri-hpv/entradas/cloudcomputing --output s3://mylog-uri-hpv/salidas/cloudcomputing --reducer aggregate $ ./elastic-mapreduce --list --activej-3EBJ6MT4FBM80 STARTING Development Job Flow PENDING Example Streaming Step $ ./elastic-mapreduce --list --activej-3EBJ6MT4FBM80 RUNNING ec2-23-20-6-34.compute-1.amazonaws.com Development Job Flow RUNNING Example Streaming Step

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 45 -

Page 46: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Exec EMR Monitoring from Management Console

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 46 -

Page 47: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Exec EMR Provisioning on demand

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 47 -

Page 48: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Exec EMR Monitoring Graphs

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 48 -

Page 49: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Results EMR Results on S3

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 49 -

Page 50: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

Index

● Introduction ● Infraestructure ● Development and Results ● Conclusions

- 50 -

Page 51: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

ConclusionsThe software development model is completely new. Is eliminated the purchase process, the installation process is becoming easier, the role of system administrator (sysadmin, DBA, etc.) is disappearing, the developer can focus on business logic, not only provides AWS infrastructure, but also the development platform. Twitter api is well documented and easy to use. This model is available to a company of any size. The free application layer covers all hardware components used in this exercise (EC2, EBS, Elastic IP, S3) except for one small EC2 instance that is used on demand in the process of MapReduce. The total charge for the development of this exercise was USD $ 0.45

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 51 -

Page 52: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

ConclusionsCharges:

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 52 -

Page 53: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

References

http://aws.amazon.comhttp://aws.amazon.com/es/elasticmapreduce/http://docs.amazonwebservices.com/ElasticMapReduce/latest/GettingStartedGuide/Welcome.html?r=6602 https://dev.twitter.com/docshttps://dev.twitter.com/starthttps://dev.twitter.com/docs/using-searchhttps://dev.twitter.com/docs/api/1/get/search

Page 54: Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them, ... - Elastic IP - Simple Storage Service S3 ... cloud computing rpp: return per

ThanksIn order to know deeper about AWS services, mapreduce process, the public data available from tweeter and the method to interact with them, i developed a little example, using: AWS Infraestructure:- Elastic Cloud Compute EC2- Elastic Block Store EBS- Elastic IP- Simple Storage Service S3

AWS Tools:- Management Console- CloudWatch- Elastic MapReduce EMR

Tweeter Search API