cloud computing with amazon web services, part 2: storage in the cloud with amazon simple storage...
Post on 12-May-2015
Embed Size (px)
- 1.Cloud computing with Amazon Web Services, Part 2: Amazon Simple Storage Service (S3) Reliable, flexible, and inexpensive storage and retrieval of your dataSkill Level: Introductory Prabhakar Chaganti (firstname.lastname@example.org) CTO Ylastic, LLC.19 Aug 2008 In this series, learn about cloud computing using Amazon Web Services. Explore how the services provide a compelling alternative for architecting and building scalable, reliable applications. This article delves into the highly scalable and responsive services provided by Amazon Simple Storage Service (S3). Learn about tools for interacting with S3, and use code samples to experiment with a simple shell. Amazon Simple Storage Service Part 1 of this series introduced the building blocks of Amazon Web Services and explains how you can use this virtual infrastructure to build Web-scale systems.In this article, learn more about Amazon Simple Storage Service (S3). S3 is a highly scalable and fast Internet data-storage system that makes it simple to store and retrieve any amount of data, at any time, from anywhere in the world. You pay for the storage and bandwidth based on your actual usage of the service. There is no setup cost, minimum cost, or recurring overhead cost.Amazon provides the administration and maintenance of the storage infrastructure, leaving you free to focus on the core functions of your systems and applications. S3 is an industrial-strength platform that is readily available for your data storage needs. It's great for: Amazon Simple Storage Service (S3) Copyright IBM Corporation 1994, 2008. All rights reserved.Page 1 of 21
2. developerWorksibm.com/developerWorks Storing the data for your applications. Personal or enterprise backups. Quickly and cheaply distributing media and other bandwidth-guzzlingcontent to your customers. Valuable features of S3 include: Reliability It is designed to tolerate failures and repair the system very quickly with minimal or no downtime. Amazon provides a service level agreement (SLA) to maintain 99.99 percent availability. SimplicityS3 is built on simple concepts and provides great flexibility for developing yourapplications. You can build more complex storage schemes, if needed, bylayering additional functions on top of S3 components. Scalability The design provides a high level of scalability and allows an easy ramp-up in service when a spike in demand hits your Web-scale applications. Inexpensive S3 rates are very competitive with other enterprise and personal data-storage solutions on the market. The three basic concepts underpinning the S3 framework are buckets, objects, andkeys. Buckets Buckets are the fundamental building blocks. Each object that is stored in AmazonS3 is contained within a bucket. Think of a bucket as analogous to a folder, or adirectory, on the file system. One of the key distinctions between a file folder and abucket is that each bucket and its contents are addressable using a URL. Forexample, if you have a bucket named "prabhakar," then it can be addressed usingthe URL http://prabhakar.s3.amazonaws.com. Each S3 account can contain a maximum of 100 buckets. Buckets cannot be nestedwithin each other, so you can't create a bucket within a bucket. You can affect thegeographical location of your buckets by specifying a location constraint when youcreate them. This will automatically ensure that any objects that you store within thatbucket will be stored in that geographical location. At this time, you can locate yourbuckets in either the United States or the European Union. If you do not specify alocation when creating the bucket, the bucket and its contents will be stored in the Amazon Simple Storage Service (S3) Page 2 of 21 Copyright IBM Corporation 1994, 2008. All rights reserved. 3. ibm.com/developerWorks developerWorkslocation closest to the billing address for your account.Bucket names need to conform to the following S3 requirements: The name must start with a number or a letter. The name must be between 3 and 255 characters. A valid name can contain only lowercase letters, numbers, periods, underscores, and dashes. Though names can have numbers and periods, they cannot be in the IP address format. You cannot name a bucket 192.168.1.254. The bucket namespace is shared among all buckets from all of the accounts in S3. Your bucket name must be unique across the entire S3. Buckets that will contain objects to be served with addressable URLs must conform to the following additional S3 requirements: The name of the bucket must not contain any underscores. The name must be between 3 and 63 characters. The name cannot end with a dash. For example, myfavorite-.bucket.com is invalid. There cannot be dashes next to periods in the name. my-.bucket.com is invalid. You can use a domain naming convention for your buckets, such as media.yourdomain.com, and thus map your existing Web domains or subdomains to Amazon S3. The actual mapping will be done when you add DNS CNAME entries to point back to S3. The big advantage with this scheme is that you can use your own domain name in your URLs to download files. The CNAME mapping will be responsible for translating between the S3 address for your bucket. For example, http://media.yourdomain.com.s3.amazonaws.com becomes the more friendly URL http://media.yourdomain.com.ObjectsObjects contain the data that is stored within the buckets in S3. Think of an object as the file that you want to store. Each object that is stored is composed of two entities: data and metadata. The data is the actual thing that is being stored, such as a PDF file, Word document, a video file, and so on. The stored data also has associated metadata for describing the object. Some examples of metadata are the content type of the object being stored, the date the object was last modified, and any other metadata specific to you or your application. The metadata for an object is specified by the developer as key value pairs when the object is sent to S3 for storage.Amazon Simple Storage Service (S3) Copyright IBM Corporation 1994, 2008. All rights reserved. Page 3 of 21 4. developerWorks ibm.com/developerWorks Unlike the limitation on the number of buckets, there are no restrictions on thenumber of objects. You can store an unlimited number of objects in your buckets,and each object can contain up to 5GB of data. The data in your publicly accessible S3 objects can be retrieved by HTTP, HTTPS,or BitTorrent. Distribution of large media files from your S3 account becomes verysimple when using BitTorrent; Amazon will not only create the torrent for your object,it will also seed it! Keys Each object stored within an S3 bucket is identified using a unique key. This issimilar in concept to the name of a file in a folder on your file system. The file namewithin a folder on your hard drive must be unique. Each object inside a bucket hasexactly one key. The name of the bucket and the key are together used to providethe unique identification for each object that is stored in S3. Every object within S3 is addressable using a URL that combines the S3 serviceURL, bucket name, and unique key. If you store an object with the keymy_favorite_video.mov inside the bucket named prabhakar, that object can beaddressed using the URL http://prabhakar.s3.amazonaws.com/my_favorite_video.mov. Though the concepts are simple, as shown in Figure 1, buckets, objects, and keystogether provide a lot of flexibility for building your data storage solutions. You canleverage these building blocks to simply store data on S3, or use their flexibility tolayer and build more complex storage and applications on top of S3 to provideadditional functions. Figure 1. Conceptual view of S3 Amazon Simple Storage Service (S3) Page 4 of 21 Copyright IBM Corporation 1994, 2008. All rights reserved. 5. ibm.com/developerWorks developerWorks Access logging Each S3 bucket can have access log records that contain details on each request for a contained object. The log records are turned off by default; you have to explicitly enable the logging for each Amazon S3 bucket that you want to track. An access log record contains a lot of detail about the request, including the request type, the resource requested, and the time and date that the request was processed.The logs are provided in the S3 server access log format but can be easily Amazon Simple Storage Service (S3) Copyright IBM Corporation 1994, 2008. All rights reserved.Page 5 of 21 6. developerWorksibm.com/developerWorks converted into Apache combined log format. They can then be easily parsed by anyof the open source or commercial log analysis tools, such as Webalizer, to give youa human readable report and pretty graphs upon request. The reports can be veryuseful to gain insight into your customer base that's accessing the files. SeeResources for tools you can use for easier visualization of the S3 log records.SecurityEach bucket and object created in S3 is private to the user account creating them.You have to explicitly grant permissions to other users and customers for them to beable to see the list of objects in your S3 buckets or to download the data containedwithin them. Amazon S3 provides the following security features to protect yourbuckets and the objects in them. AuthenticationEnsures that the request is being made by the user that owns the bucket orobject. Each S3 request must include the Amazon Web Services access keythat uniquely identifies the user. AuthorizationEnsures that the user trying to access the resource has the permissions orrights to the resource. Each S3 object has an access control list (ACL)associated with it that explicitly identifies the grants and permissions for thatresource.You can grant access to all Amazon Web Services users or to a specific useridentified by e-mail address, or you can grant anonymous access to any user.Integrity Each S3 request must be digitally signed by the requesting user with an Amazon Web Servi