Architecting for the cloud scability-availability

Download Architecting for the cloud scability-availability

Post on 10-May-2015




0 download

Embed Size (px)


Day 2 of the course Architecting for the Cloud. The lectures include scalability and availability


<ul><li>1.Architecting for the Cloud Len and Matt Bass Scalability </li></ul><p>2. Link to yesterdays slides for-the-cloud-intro-virtualization-iaa-s 3. Outline Introduction to scalability CPU scaling I/O scaling 4. Characteristic of cloud from NIST On-demand self-service. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each services provider. 5. Scale in the Cloud Many people think that you get scalability just by virtue of being in the cloud This isnt true What the cloud gives you is the ability to quickly and easily add resources It doesnt guarantee that this results in additional capacity Just like with security you need to design scalability in 6. What is Scalability? (Problem definition) Scalability is the ability of a system to support growing amount of work. May be from additional users May be from additional requests from current users May be from operational activities. (Solution definition) Scalability is the ability to increase or decrease the resources available to your application by either changing the number of servers or disks or changing the size of the servers or disks. 7. Why scale? Are more users always a good thing? This is a cost/benefit question. More users have benefits presumably more people receive service and the organization more revenue. More users have a cost hardware, software, and personnel. Do costs scale linearly with users? For Netflix, the answer is yes. For Linkedin, the answer is no. 8. The different aspects of scalability Adding users Large amounts of new users may require new computation facilities Adding data Large amounts of new data requires More computation Careful attention to the distribution of this data. Adding computation Computation is embedded in virtual machines Elasticity means adding new virtual machines Scaling should not impact existing activities May need to scale by adding computation capacity (CPU) or by adding I/O capacity 8 9. Scaling Up vs Scaling Out Scaling up means adding more capacity to existing hardware More memory More disk Faster CPU or more cores Scaling out means adding additional hardware More systems 10. Costs in scaling out Each virtual machine has a cost per hour Licensing costs. Many software packages charge licenses per CPU or per (virtual) computer. Every new instance that utilizes one of these packages incurs licensing costs Personnel costs In small to medium size organizations, one sysadmin can administer ~30 machines. In large, highly automated organizations, one sysadmin can administer ~1000s of machines. Movement called DevOps has as one goal the reduction of personnel costs in operations. (more on this later). 11. How much lead time for growth of number of users? Some things are predictable Seasonal variation. Christmas Tax season Daily variation Working hours or non-working hours in various time zones Holidays Promotions or special offers Sporting events Other things are not predictable Being SlashDotted News items Rapid growth in popularity of a company. Disaster 12. Managing growth in number of users A lead time allows planning Restructure database Add or restructure software When no lead time is available, elasticity of the cloud is the main mechanism. 13. Outline Introduction to scalability CPU scaling Load balancers Rule Based Scaling Scaling Patterns I/O scaling 14. Why have a load balancer? Suppose there are too many users for a single instance of a service The cloud allow us to create another instance of that service (elasticity) We would like to have the half the users use one instance and half use the other Two options: 1. Couple instances and users (half and half). This is accomplished by having users access an instance of a service directly by IP address. 2. Use an intermediary (load balancer) to distribute half of the requests to one instance and the other half to the other. Option 2 is preferable for a variety of reasons which we will see. 14 15. Load Balancing Physically a load balancer is a box that looks like it belongs in a computer network. 16. Load Balancer Logically, a load balancer takes requests from clients and distributes them to copies of an application executing on multiple different servers Servers Clients Load Balancer 17. Message sequence client makes a request Servers Clients Load Balancer 18. Message sequence- request arrives at load balancer Servers Clients Load Balancer 19. Message sequence request is send to one server Servers Clients Load Balancer 20. Message sequence reply goes directly back to client Servers Clients Load Balancer 21. Suppose Load Balancer Becomes Overloaded Load Balance the Load Balancers 22. Hierarchy of Load Balancers Server always sends message back to client. Load balancers use variety of algorithms to choose instance for message Round robin. Rotate requests evenly Weighted round robin. Rotate requests according to some weighting. Hashing IP address of source to determine instance. Means that a request from a particular client always sent to same instance as long as it is still in service. Note that these algorithms do not require knowledge of an instances load. That situation we will cover in a little bit. 23. Outline Introduction to scalability CPU scaling Load balancers Rule based scaling Scaling Patterns I/O scaling 24. Rule Based Scaling 25. Server A server is a virtual machine without any software A virtual machine can be allocated with varying amounts of memory, CPU, disk Each variant has different cost, typically per hour 26. Machine Image A machine image is a copy of the contents of the memory of a computer. A machine image may be created from any contents of a computer. Some options: Bare metal With OS With LAMP Stack Linux Apache HTTP Server MySQL PhP or Python If licensed software is contained in the machine image, then a license fee is paid when it is loaded 27. Executable Virtual Machine An executable virtual machine is created by loading a machine image into a server. Executable virtual machine can then be Booted Paused Shut down Machine Image Server 28. Adding/Removing Resources Example shows two servers with one to be removed. Could be N servers with one to be added or removed Creating a new instance takes some time Removing an instance also takes time it must satisfy existing requests and be detached from existing connections. 29. Autoscaling group An autoscaling group is a collection of instances that have been defined to be scaled together. Typically these represent instances of the same application. 30. Creating an autoscaling group An autoscaling group needs to know Machine instance id VM type Scaling policy 31. Scaling Policy Specify minimum, maximum, and desired number of instances Can specify scaling based on time of day E.g. scale up during 9:00-5:00 and down other times Can scale based on average CPU usage E.g. average CPU utilization 60% means add instance. Values come from monitor. 32. Outline Introduction to scalability CPU scaling Load balancers Rule Based Scaling Scaling Patterns I/O scaling 33. Scaling Patterns Autoscaling implements Push Pattern for messages Another pattern is Pull Pattern 34. Push Pattern 35. Push Pattern Description Client sends a request (e.g. HTTP message) to the app in the cloud. Request arrives at a load balancer Load balancer forwards request to one of the VMs in the resource pool. Load balancer uses scheduling strategy to decide which VM gets the request, e.g. dispatch to VM with lowest CPU utilization. 36. How does the load balancer know? The load balancer knows CPU utilization of the VMs and it knows how many requests it (the load balancer) has received, and possibly how long it took to service the requests. It does not know application specifics such as how many requests a VM can process. When resource pool is overloaded, new resources are allocated. The monitor decides (based on controller rules) when new resources are needed. It must have direct insight into the VM instances in order to do this. Hence, the monitor utilizes a monitoring service provided by the cloud for each instance. 36 37. Pull architecture pattern (aka Producer- Consumer) 38. Pull architecture description Each request from the client is application specific and typed. The queue keeps separate queues for each application running on the VMs. A VM requests the next message of a particular type (pull) and processes it. The monitor can now see how long a request waits in a queue or the average queue length and this is an indication of the load on the VMs that have applications that service requests of that type. 39. Differences Push is more responsive to requests. They are immediately forwarded to a service. There is a possibility that the service is overloaded. Pull is less responsive since it relies on servers to de-queue messages. In the pull architecture, a service polls for new messages even if there is nothing in its queue and this introduces overhead. It is easier to monitor and control workload in the pull architecture since messages are application specific and typed. 40. Outline Introduction to scalability CPU scaling I/O scaling Multiple sites Software techniques 41. I/O Scaling Scaling out assumes scaling requirement is solved with more CPUs. It may be that I/O is also a problem. You may run your application in multiple sites Half the clients go to one site, half to another 42. Questions when you have multiple sites How do clients know which site to use? How are databases used by the applications coordinated across sites (we defer this question). 43. Domain Name Server (DNS) Client sends URL to DNS DNS takes as input a URL and returns an IP address Client uses IP address to send message to load balancer for a site Site 1 Site 2 Domain Name Server DNS 44. DNS with multiple sites DNS server returns IP address of both sites. DNS server will vary which address is listed first. Client will, typically, choose first entry. Site 1 Site 2 Domain Name Server 456.77.88.99123.45.67.89 DNS 45. Outline Introduction to scalability CPU scaling I/O scaling Multiple sites Software techniques 46. Recall Pull Pattern 47. To Scale for I/O - Make the queue manager more sophisticated Key Value Store Publisher takes values from key- value store and distributes them Clients 48. Summary Scalability is the ability to respond to increasing or decreasing workload Add CPU capacity through utilizing features of cloud provider Add I/O capacity through Distributing requests to multiple sites Have fast message passing software 49. QUESTIONS? 50. Architecting for the Cloud Introduction to Availability 51. Outline What is availability Faults Availability patterns 52. Outline What is availability Faults Availability patterns 53. Cost of Downtime According to a recent survey the average cost of unplanned downtime is $7,900/minute* 91% of reporting companies have experienced an unplanned outage in the last 24 months The average outage lasts 118 minutes The average frequency of outages over a 24 month period were: 10.16 limited outages 5.88 local outages 2.04 total outages * Emerson Network Power, Ponemon Institute Study 2013 54. Cost of Downtime II As the previous numbers indicate downtime can be expensive Experienced in August 2013 New York Times had a 2 hour outage (stock price declined, twitter exploded, and Wall Street Journal dropped their fees to try and capture readership) Google had between 1 5 minutes of downtime (~$500,000 direct loss and 40% reduction in overall web traffic) Amazon had an outage of under an hour (&gt; $5 million) In addition to direct losses indirect losses are experienced Loss of confidence, reputation, and good will Productivity losses Compliance penalties 55. Availability: a Business Concern The availability of the business service impacts the earnings and associated value of an organization If the organization relies on an IT system to deliver business service then the availability of the IT system impacts the value of the organization In this section we are going to look at the availability of the system We want to keep in mind, however, that the objective is the availability of the business service 56. What Is Availability? Availability in general refers to the degree to which a system is in an operable state This is typically articulated as the percentage of time the system is available (or wed like to have the system available) e.g. 99.99% There are many related terms e.g. Availability Fault-Tolerance Reliability 57. How is Availability Measured? Availability is typically measured as: MTBF MTBF + MTTR MTBF = Mean Time Between Failures MTTR = Mean Time To Repair 58. 9s Availability Downtime per Year 90% (1-nine) 36.5 days/year 99% (2-nines) 3.65 days/year 99.9% (3-nines) 8.76 hours/year 99.99% (4-nines) 52 minutes/year 99.999% (5-nines) 5 minutes/year 99.9999% (6-nines) 31 seconds/year ! 59. Calculating System Availability I Each component = 99% (3.65 days a year) The overall system, however, has an availability that is the product of each components availability 99% X 99% = 98% (7.26 days a year) 99% 99% 60. Calculating System Availability Each component = 99% (3.65 days a year) The overall system in this case, however, is based on the likelihood that both components would fail at the same time 1 ((100% - 99%) X (100% - 99%) )= 99.99% (3.65 hours a year!!) Redundant Elements 99% 99% 61. Availability Measures A couple of things to keep in mind These measures refer to the mean not the minimum time between failures As the MTBF increases the impact of MTTR decreases As the MTTR approaches 0 the overall availability approaches 1 Historically these measures were developed for hardware components 62. Availability Requirements MTBF can be measured for operational systems How do you predict the MTBF for a system that is yet to be built, however? Does it make sense to use the previously defined availability measure as a requirement? If not, how should requirements be articulated? 63. Actionable Requirements Remember that as a business the concern is that the services are available as needed In order to determine the likely availability of a system (or design) you must Understand the likelihood that various kinds of faults could occur Understand the impact of these faults on overall system availability You must therefore translate the desired business objective into a set of fault scenarios 64. End to End Availability Engineers often think about availability of some portion of the system e.g. Availability of the database or web server Organizations,...</p>