container overhead in microservice...

IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2018

Container overhead in microservice systems

VILHELM FRIÐRIKSSON

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Container overhead in microservice systems

Vilhelm Friðriksson

2018-10-01

Master’s Thesis

Examiner Gerald Q. Maguire Jr.

Academic adviser Anders Västberg

KTH Royal Institute of Technology School of Electrical Engineering and Computer Science (EECS) Department of Communication Systems SE-100 44 Stockholm, Sweden

AbstractContainers have been gaining popularity in recent years

due to their ability to provide higher flexibility, higherreliability and dynamic scalability to enterprise softwaresystems. In order to fully utilize containers, softwaredevelopers aim to build their software using microservicearchitecture, meaning that instead of working on a singlelarge codebase for the whole project, the software is splitinto smaller units. These microservices can be deployedin their own container instead of the traditional virtualmachine setup where a server has to configured with allnecessary dependencies. Moving away from the mono-lithic software architecture to containerized microservicesis bound to bring performance penalties due to increasednetwork calls between services and container overhead. Theintegration must therefor be carefully planned in orderto fully utilize the container setup while minimizing theoverhead. The purpose of this thesis project was to measurehow much overhead can be expected due to containers in anenterprise environment. By using a combination of virtualmachines and Docker containers, a microservice system wasdeployed with four different deployment strategies and thesystem’s performance was measured by analyzing requestresponse times under various loads. The services were madeto run on a single server and on multiple servers, with andwithout Docker. The performance measurements showedthat the system performed worse in every case when Dockerwas used. Furthermore, the results showed that Docker canhave significant negative impact on performance when thereis a heavy load on the system.

Keywords: Microservices, Containers, Docker, Virtualmachines, Cloud computing

I

SammanfattningContainers har blivit populärare under de senaste årentack vare deras förmåga att ge högre flexibilitet, högretillförlitlighet och dynamisk skalbarhet för företagspro-gramvarusystem. För att fullt ut kunna använda containershar programutvecklarna för avsikt att bygga sin program-vara med hjälp av mikroservicearkitekturen, vilket innebäratt programvaran delas upp i mindre enheter istället föratt arbeta på en enda stor kodbas för hela projektet. Dessamikroservices kan distribueras i sina egna containers iställetför den traditionella virtuella maskininstallationen, där enserver måste konfigureras med alla nödvändiga beroenden.Att flytta sig från monolitisk mjukvaruarkitektur till con-taineriserade microservices kommer att få prestandaförsäm-ringar på grund av ökade nätverksanrop mellan tjänster ochcontainer-overhead. Integrationen måste därför noggrantplaneras för att fullt ut utnyttja containeruppsättningenoch minimera overhead. Syftet med detta avhandlingspro-jekt var att mäta hur mycket overhead kan förväntaspå grund av containers i en företagsmiljö. Genom attanvända en kombination av virtuella maskiner och Docker-containers, implementerades ett microservices-system medfyra olika implementeringsstrategier och systemets pre-standa mättes genom att analysera anropens svarstid underolika belastningar. Tjänsterna gjordes för att köras på enenda server och på flera servrar, med och utan Docker.Prestandamätningarna visade att systemet var sämre i allafall när Docker användes. Dessutom, visade resultaten attDocker kan ha signifikant negativ inverkan på prestandanär det är tung belastning på systemet.

Keywords: Mikroservices, Containers, Docker, Virtuellamaskiner, Molntjänster

II

AcknowledgementsI would like to express my sincere gratitude to my examiner, Professor Gerald Q.Maguire Jr., who always provided thorough feedback when asked and helped setthe course straight when I found myself in trouble. I consider myself very lucky tohave had the chance to work under his guidance.

Special thanks to the people at Betsson Group for giving me the chance to work onthis project, all the help they provided and for making me feel welcome at Betsson.

Finally, heartfelt tanks to my parents, Friðrik Vilhelmsson and Ingibjörg MaríaIngvadóttir, for their endless support and encouragement.

Stockholm, September 2018Vilhelm Friðriksson

III

Contents

1 Introduction 11.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Research methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 72.1 Betsson Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Software architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Monolithich architecture . . . . . . . . . . . . . . . . . . . . . 82.2.2 Microservices . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Virtual Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.6 Using Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.7 Orchestration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.8 Cloud computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.9 Technology stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.9.1 .Net Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.9.2 Couchbase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.9.3 RabbitMQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.9.4 Taurus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.10 Zipkin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.11 Nmon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.12 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.12.1 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.12.2 Microservices . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Methodology 273.1 The Test System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

IV

3.2 System setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.3 Test scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Results 354.1 Taurus results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.1 Single server setup . . . . . . . . . . . . . . . . . . . . . . . . 354.1.2 Multiple servers setup . . . . . . . . . . . . . . . . . . . . . . 38

4.2 Zipkin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 Discussions 455.1 Time measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.2 Network calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.3 Nmon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.4 Resource contention . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.5 Previous research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6 Conclusions and future work 536.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . 54

6.2.1 Different hosting environments . . . . . . . . . . . . . . . . . 546.2.2 Container scheduler . . . . . . . . . . . . . . . . . . . . . . . 556.2.3 Grouping containers together . . . . . . . . . . . . . . . . . . 556.2.4 Network configuration . . . . . . . . . . . . . . . . . . . . . . 56

6.3 Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

References 59

V

List of Figures

2.1 Microservices can be deployed independently based on workload . . . . 102.2 Hypervisors of type 1 and 2 . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Docker containers on a host machine . . . . . . . . . . . . . . . . . . . . 132.4 Virtual machine and container setups . . . . . . . . . . . . . . . . . . . 15

3.1 Overview of the test system . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 System architecture with services running on multiple servers. . . . . . . 303.3 System architecture with services running on a single server. . . . . . . 30

4.1 Log values of the median response times when using a single server . . 364.2 CDFs of request times when using a single server with Docker . . . . . 374.3 CDFs of request times when using a single server without Docker . . . 374.4 Log values of the median response times when using multiple servers . 384.5 CDFs of request times when using multiple servers with Docker . . . . 394.6 CDFs of request times when using multiple servers without Docker . . . 40

VI

List of Tables

2.1 Orchestrator tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 m4.xlarge instance specification. . . . . . . . . . . . . . . . . . . . . . . 29

4.1 Taurus load testing results for a single server with Docker . . . . . . . 364.2 Taurus load testing results for a single server without Docker . . . . . 364.3 Taurus load testing results for multiple servers with Docker . . . . . . 384.4 Taurus load testing results for multiple servers without Docker . . . . 394.5 Docker related overhead for a single server . . . . . . . . . . . . . . . . 404.6 Docker related overhead for multiple servers . . . . . . . . . . . . . . . 414.7 Zipkin trace results for the User service when using a single server . . . 424.8 Zipkin trace results for the User service when using multiple servers . . 424.9 Zipkin trace results for the Blacklist service when using a single server 424.11 Zipkin trace results for the Geolocation service when using a single server 424.10 Zipkin trace results for the Blacklist service when using multiple servers 434.12 Zipkin trace results for the Geolocation service when using multiple

servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.1 Measured waiting time for the blacklist service when using a single server 465.2 Measured request time for the blacklist service when using a single server 465.3 Measured waiting time for the blacklist service when using multiple

servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.4 Measured request time for the blacklist service when using multiple

servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.5 Measured waiting time for the geolocation service request time when

using a single server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.6 Measured waiting time for the geolocation service when using multiple

servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.7 Measured network request time for blacklist service when using a single

server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.8 Measured network request time for blacklist service when usingmultiple

servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.9 Measured network request time for geolocation service when using a

single server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

VII

5.10 Measured network request time for geolocation service when usingmultiple servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.11 Measured Docker overhead of network calls to the blacklist service . . . 505.12 Measured Docker overhead of network calls to the geolocation service . 50

VIII

List of Listings

2.1 Dockerfile example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Docker build command . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 Docker run command . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4 Example Taurus configuration file . . . . . . . . . . . . . . . . . . . . 212.5 Running Taurus test . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6 Installing nmon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.7 Starting nmon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1 Taurus load test file . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

IX

List of acronyms and abbreviations

AMQP Advanced Message Queuing ProtocolAPI Application Programming InterfaceAPM Application Performance ManagementAWS Amazon Web ServicesCDF Cumulative Distribution FunctionCSV Comma Separated ValuesHTTP Hypertext Transfer ProtocolIaaS Infrastructure as a ServiceJSON JavaScript Object NotationMQTT Message Queuing Telemetry TransportNoSQL Not only SQLPaaS Platform as a ServiceRPC Remote Procedure CallRPS Requests Per SecondSaaS Software as a ServiceSDK Software Developement KitSTOMP Streaming Text Oriented Messaging ProtocolUI User Interface

X

Chapter 1

Introduction

The Internet has come a long way since its early days and today has become anintegral part of modern society. Web services are expected to run every second ofthe day and be available to the whole world. Running services that people relyon, whether it is for work or play, requires plans for reliability and scalability. Inaddition to being robust and reliable, users also expect services to respond instantly,no matter what the current load is. As the quality requirements for web serviceshave gotten more demanding, so has the complexity of running them. Realizingthe importance of meeting these increasing demands, engineers have continuouslyworked to improve how their services are run.

The latest trend in the industry to realize fast, reliable, and scalable systemsis to break software down into so-called microservices, where each service has aclear and well defined functionality. While it may seem to the user that he or sheis interacting with one service, underneath there are multiple services interactingwith each other generate a response. Microservices have gained popularity becausethey make it possible to speed up both operations and development. It is easierto scale, upgrade, and handle failures when a service is composed of many isolatedentities instead of being monolithic. However, the complications of running such aservice increases as the required capacity of the service grows because the systemrealizing the service consists of hundreds or thousands of independent services thatneed to be configured and monitored.

Containers are another technology that has been gaining popularity in recentyears. Containers offer a way to isolate software from other processes running in thesame environment. One host can run multiple containers which all are isolated fromeach other. Additionally these containers can include everything that is needed torun a service (i.e., the code, system tools, packages, and settings), thus escapingthe dependency hell [1] that operators have had to deal when the requirements ofmultiple applications running on the same host could cause conflicts. Today it hasbecome a common practice to combine microservices and container technology so

1

CHAPTER 1. INTRODUCTION

that the services are developed, tested, and deployed using containers.

Lastly, virtual machines have played an important part in realizing web servicesfor a long time. They enable operators to quickly set up new (virtual) machines withthe hardware and network specifications they require without having to go throughthe hassle of dealing with bare-metal servers. Virtual machines are the foundationof many modern web services as in many cases the containerized microservices arerunning on top of virtual machines. This is the way the main cloud providers,such as Amazon Web Services and Google Cloud Platform, run their containerservices. On the surface the operator works with containers, but underneath thecloud platforms spawn virtual machines to host those containers. The performanceanalysis of modern web services can be complex because not only are the servicesthem selves potential bottlenecks, but also the nested virtualization technology thatis being used further complicates performance analysis.

Betsson Group has started to move their services into containerized micro-services. Their aim is to transition the production back end platform from atraditional Windows/IIS-based server architecture into a more scalable Dockerarchitecture in hopes of reaping benefits, such as higher flexibility, higher reliability,and dynamic scalability.

When running a large system of containerized services, automation becomesincreasingly important. Container orchestration systems are used to handle mostoperational tasks, such as deployments, updates, assessments of resource utilization,and health checks. Therefore evaluations of container orchestrators are an importantpart of Betsson’s future work.

Equally important is to figure out how much overhead can be expectedfrom moving away from the monolithic software to containerized microservices.According to W. Felter, et al. Docker has been shown to have minimal performanceoverhead compared to the host it is running on and virtual machines have constantlyimproved performance wise [2].

Moreover, the complexity of large scale systems requires more complicatedbenchmarks in order to estimate the virtualization overhead for such systems.However, limited research has been done on measuring the overhead in microservicebased systems. Microservices tend to make multiple network calls in order performthe same work that a single thread handles in a monolithic application. Whencombining virtual machines, containers, and microservices the performance impactcan therefor be substantial if the system is not developed in an appropriate manner.

The thesis focuses on estimating the performance impact Betsson can expectwhen transitioning away from their well tested architecture to a containerizedmicroservice architecture running on top of virtual machines. Additionally, the

2

1.1. PROBLEM DESCRIPTION

thesis takes looks at where the overhead is coming from and if it can be reduced.

1.1 Problem Description

Containerized microservices have become very popular system architecture inrececent years. However, this architecture comes with a cost. Docker containershave been been shown to have minimal overhead compared to their hosts, but someworkloads may cause issues, particularly when there are high disk and networkworkloads. Microservices can be particularly network heavy, as a single user requestmay involve communications between several services. In contrast, because theresponse from a monolithic application will generally be generated by a singlethread there is an obvious performance impact. Little research has been done onthe impact of containers on microservices and the impact of running containerson virtual machines (as is common practice in production systems.) This degreeproject aims to close that gap by performing performance evaluations on a smallmicroservice system running on virtual machines, both with and without containertechnology. Furthermore it aims to answer the following questions:

• Is it possible to run containerized microservices on top of virtual machineswithout incurring an unacceptable decrease in performance?

• Where does the overhead mainly come from when using containers?

1.2 Purpose

The thesis discusses the performance overhead incurred from running microservicesin containers hosted on virtual machines. The purpose of transitioning intomicroservice architecture is to have the ability to run a more flexible, reliable,and scalable system. Previous work has shown that virtualization comes with aperformance cost. The purpose of this work is to quantify how much performanceloss can be expected when running a system of microservices in containers on topof virtual machines compared to running the same system of microservices directlyon virtual machines.

1.3 Goals

The goal of this work is to evaluate the performance penalties experienced whenrunning microservices in containers compared to a non-containerized solution.Furthermore, a goal is to explain the nature of the overhead incurred, hencethese evaluations can help create a system design appropriate for containerizedmicroservices.

3

CHAPTER 1. INTRODUCTION

1.4 Research methodology

Quantitative and experimental research methods are used in order to performthe work for this project. The nature of experimental research is to work withvariables and establish relationships between them. Manipulating variables inthe environment may change the results of the experiment, which can then befurther analyzed. This method is ideal when analyzing system performance. In thisproject a laboratory system will be tested with and without Docker and quantitativeanalysis is performed on the gathered performance metrics in order to compare theperformance of the different experimental configurations (i.e., with and withoutcontainers).

As this project is to be carried out for a company, it is important that everythingthat is said in confidentiality remains confidential.

1.5 Delimitations

The thesis project only examines Docker as the container technology. There arevarious other technologies available, such as rawer implementations (e.g., LXC [3])or popular open source projects (e.g., rkt [4]). However, Docker has been steadilygaining popularity in the industry in recent years and is currently used by Betsson inproduction; therefore, it is considered the most logical choice for their use togetherwith microservices.

The experimental system does not depend on a container orchestrator ora service discovery mechanism to run. In a real-world dynamic containerizedenvironment, the use of orchestration is very important. However, since theexperimental system is not required to be dynamic or scalable it was decidedthat setting up orchestration would only add unnecessary overhead. Furthermore,this overhead of the orchestration would simply be additive; hence it would beindependent of whether containers were used or not; therefore it was unnecessaryfor the experiments.

Security is not considered in this work in order to simplify the building ofthe system. The application programming interface (API) calls use HypertextTransfer Protocol (HTTP)[5] without encryption, secrets are stored in plaintextconfiguration files, and no security precautions are made in how the system runson the isolated experimental cluster. This of course would not be acceptable for aproduction system. However, again the additional overhead of security would beadditive to either alternative (i.e., with or without containers), hence irrelevant tothe experiments reported in this thesis.

The system is hosted in the cloud with a pay-as-you-go payment model. Cost

4

1.6. OUTLINE

evaluations, such as finding the most cost effective way to run the system, were nota part of this project.

1.6 OutlineThe following chapters describe the work performed for this degree project and theresults of the project. Chapter 2 provides the background needed to understandthe rest of the thesis. Specifically, it describes the virtualization techniques andthe software architecture used for the project along with providing insights aboutthe technology stack used. Finally, it presents previous work done related to thesubject. Chapter 3 describes how the main work of the project was carried out.It explains how the laboratory system was built for this project, its architecture,how it was set up, and how the performance evaluations were conducted. Chapter4 showcases the results from the performance evaluations and Chapter 5 discussesthese results. Lastly, Chapter 6 states the project’s conclusions with a summary ofthe results, the limitations of the project, and suggestions for possible future work.

5

Chapter 2

Background

This chapter explains the concepts needed to understand the thesis and offersinsights into related work. The first section explains what Betsson Group is andwhat product it offers. Later sections explain more technical parts of the work,such as the difference between a monolithic software architecture and microservices,along with what virtual machines and containers are and how they are used to runenterprise systems.

Enterprise systems are usually built with large technological stacks and Ssection2.10 explains the various technologies used to build and run the system thisthesis focuses on. In Section 2.8 container orchestrators are explained. The lastsection focuses on previous work regarding performance analysis of containers andmicroservice systems.

2.1 Betsson Group

Betsson Group is a Swedish gaming company and it is one of the largest listedgaming companies in the world, operating approximately 20 brands around theglobe. Betsson offers a large variety of products. The largest revenue stream comesfrom their mobile casino which aims to offer offline casino experience to remoteplayers that is as close as possible to the real thing. With over 1,500 availablegames it is one of the world’s largest mobile casinos. Betsson’s sportsbook offers alarge variety of bets on a range of sports - as well as political events, markets, andmore. Betsson offers live streaming of various sports as well together with real-timein-game betting. Aside from the casino and betting services Betsson has also arange of other games, such as poker variations, bingo, and scratch cards. Due to itspopularity, Betsson’s system must handle a large amount of requests every day andmust be capable of handling large spikes in traffic during high profile events, suchas final matches of popular tournaments.

7

CHAPTER 2. BACKGROUND

2.2 Software architecture

The monolithic software architecture has been popular for a long time and may beconsidered as the industry norm. In recent years, developers have been turningtowards microservices due to the many advantages they can bring; but as inall software related matters, there are no silver bullets and each use case mustbe considered carefully before choosing which architecture to use. The projectfocuses on microservices as the main software architecture. This section explainswhat microservices are and how they differ from the more traditional monolithicarchitecture.

2.2.1 Monolithich architecture

A monolithic (software) architecture means that all the software functionality isbundled into one component, hence creating a monolith. The result is a singledeployable application consisting of one codebase. The reason for the increasedpopularity of microservices is due to the problems that may arise as the monolithgrows larger.

Take for an example a simple web application. At first it enables users to visit awebsite in order to view its content. Later on it is decided to add a REST applicationprogramming interface (API) adds functionality so that users can also interact witha web application (webapp) through an API. As the monolith’s nature dictates, allof this new functionality is added to the application’s codebase. If done correctly,this should not greatly increase the complexity of the codebase. The monolithic wayof dealing with complexity is to break down the program into modules. However,the application still remains a single executable artifact. As the application growslarger and more independent services are added, the complexity increases and thecodebase becomes harder to maintain and evolve. At some point in time there maybe several independent teams working on the same codebase, each team responsiblefor what is effectively a different service. Unfortunately, the teams are forced touse the same technology as the original application uses, although there may betechnologies that are better suited for the service they are developing. Moreover, anupdate to one service means an update and redeployment of the whole application.For large projects the whole process of making changes, testing, and deploymentcan prove difficult and time consuming. In many cases it also means downtime.

Scaling the application may also prove a nuisance. A certain functionality ofthe application, the REST API for example, may become very popular causing anincreased number of requests. In order to handle an increased load the applicationcan be deployed on multiple servers. However, although the increased load is justfor a single functionality the whole application must be deployed.

This example focused on a small application in order to explain the difficulties

8

2.2. SOFTWARE ARCHITECTURE

that may arise with the monolith. The purpose is not to assert that the monolithicapproach is inferior. Instead the argument is that the the architecture must bechosen to suit the project and in many cases the monolith is the right choice.Additionally, it is common to start with the monolithic approach and then evolvethe system architecture into microservices if needed.

2.2.2 Microservices

Microservices have become a widespread software architecture in recent years dueto their many advantages over the older monolithic model. There is no officialstandard defining what constitutes a microservice. But rather it is a term used todescribe a certain methodology that software engineers have started to embrace.A variety of information on the methodology can be found in articles and lecturesavailable via the internet. Additionally, a few books have been written as well, suchas Building Microservices by Sam Newman [6].

The idea behind microservices is to build a single application out of multiplesmaller services that interact with each other through a well defined API. On thesurface it looks like a single application with multiple functionalities, just like alarge monolith. Behind the interface it is a combination of smaller services. Asingle microservice can be described as an application with a single responsibilitythat can be deployed, scaled and tested independently [7]. The services are ideallybusiness oriented. For example, splitting an application into the traditional threelayers: user interface, application logic, and database layer - does not constitute amicroservice architecture. Instead a service handles one business capability, userdata for instance, and does so on every necessary layer (i.e., UI, logic and persistentdata storage). In the earlier monolithic example, the web application and the RESTAPI would be divided into two independent services. If done correctly, this mayremedy pitfalls mentioned earlier about the monolitcih architecture.

Dividing the software into loosely coupled modules should also make it easier towork with. In large projects, responsibilities for different services can be distributedamong teams. This makes updates to the codebase easier, because now a given teamupdates and deploys a single service instead of working with the whole application.Furthermore, this improves fault isolation and scalability. When a service goesdown the other services continue to work, except for those functionalities relying onthe failed service. As shown in Figure 2.1, services can be deployed independently,hence a heavily used service can be replicated on several servers without needing toreplicate the other services. The servers on the left run monolithic applications andtherefore an instance of each service is deployed on every server. The servers on theright run independently deployed microservices. Each service is also technologicallyindependent, thus allowing each team to pick different technology stacks - such thatis the best suited for its specific job. Of course each technological decision must bewell thought out since running too many stacks may impose other problems on the

9


company.

Figure 2.1: Microservices can be deployed independently based on workload

The cost of a microservice architecture must also be considered. Working withdistributed systems is complicated. A single process is no longer responsible forhandling a request, as a service may need to call upon other services remotelyand these in turn may have to call on other remote services. The complexityand latency is additive, hence each inter-service interaction must be thoroughlyanalyzed. Additionally, working with persistent data becomes more complicated.A monolith can update several things in one function call but with microserviceseach service may be responsible for a single update causing issues about consistencyunless handled correctly. Running multiple services requires an operational culturethat can handle these multiple services. Running hundreds of services requires botha good overview and careful management. Many tools are available to help, as willbe discussed later in the thesis (specifically in Section 2.11). Moreover, it is muchharder to debug a distributed system when things go wrong.

All things considered, which architecture to choose depends on the project andthe problems it is supposed to address - as well as the expected loads that theservice must support.

2.3 Virtual MachinesThe computers, or servers, used to run distributed systems can either be bare-metal or virtualized. These servers are usually located in data centers, i.e., facilitiesspecially designed to house large number of computers. As the naming indicates,bare-metal simply means that the server is a physical instance of a single computer.The hardware of such a server has its own motherboard and this computer is notshared with others. In contrast a virtual machine may be one of ten instancesrunning on top of the same physical computer hardware. Performance-wise, bare-metal servers have been shown to offer better than virtual machines. However,

10

2.3. VIRTUAL MACHINES

virtual machines are more desirable, especially in a large scale system that requiresflexible scaling.

Working with bare-metal means that each server needs to be set up from scratch.That includes buying the individual computers, setting them up in a data centerand then configuring and operating them. This is a slow and expensive process. Inorder to be able to handle high workloads and achieve appropriate fault tolerance,multiple replicas of each server need to be available further increasing costs andcomplexity.

In contrast, virtual machines are not bound by these limitations and thus theyare ideal for running larger system applications. In simple terms, virtual machinesoffer virtualization at the hardware level, thus they isolate a portion of the availablehardware resources while making these resources available to the application givingthe impression of working with a bare-metal server. Virtual machines at applicationlevel also exist but are not relevant in the context of this thesis.

The entity that creates and runs the virtual machines is called a hypervisor.Hypervisors are conventionally categorized into two types (as shown in Figure 2.2).A type 1 hypervisor runs on the bare-metal while type 2 hypervisors run on top ofan operating system (just like normal applications). For both type 1 and type 2hypervisors there are multiple solutions available for hardware virtualization. Forlarge scale systems, type 1 hypervisors are usually used and the actual system runson top of special enterprise level hardware.

Figure 2.2: Hypervisors of type 1 and 2

When using virtualization, operators do not have to work with individualphysical computers. Instead a single physical computer can be used and its resourcessplit into isolated pieces to run as a virtual machine. Additionally each virtualmachine can run an operation system of its own choice. Each virtual machine canbe configured just as a physical computer would. In a large enterprise setting, a

11


whole server stack could be virtualized on one computer (instead of hundreds ofphysical machines with one physical machine for each logical server). In order toachieve reliability and fault tolerance the server’s stack should be replicated on atleas one other computer. When there is a need to scale out (increase the server’scapacity), it is simply a matter of allocating resources to new virtual machines andstart these virtual machines and their entire software stack. That can be done in amatter of minutes rather than having to order new hardware, wait for its deliver, andset it up from scratch. Moreover, when less capacity is needed some of the virtualmachines can be shutdown, hence the resources that were previously occupied canbe assigned to other customers of the cloud provider.

2.4 ContainersLinux containers are not new, but only in recent years have they become amainstream solution in enterprise systems. The main reason for their increasedusage is project such as Docker [1], an open source platform for containerization.Linux containers differ from virtual machines because they run on a host systemthat has an operating system installed. They can be considered as a virtualizationon the operating system level and are considered to be lightweight because theyavoid the need for a hypervisor. When creating a container a portion of the host’sresources are made available to the container. This degree project focuses on Dockeras the container solution.

2.5 DockerDocker makes use of kernel technology to create lightweight Linux containers thatrun on the host system. Its relatively simple interface makes it easy to create,start and destroy containers. Docker adds various features that differentiate itfrom plain LXC technology. These features include portable deployment, automaticbuild, versioning and an application focused API. These features have made Dockerpopular in enterprise settings.

Docker is built upon LXC and uses the same Linux kernel technology, namelynamespaces and control groups, to isolate containers and monitor their resources.Namespaces are responsible for process isolation, thus processes running insidea container cannot affect other processes on the host system and vice versa.Namespaces introduce new network capabilities by providing each container with anetwork device and each interface with its own IP address. Control groups monitorthe resources used by the container and limit resources. These resources includeCPU, memory, network, disk I/O. etc.

Docker uses the client-server architecture. The server being the Docker daemonwhich does most of the work such as building and running containers. The client is

12

2.5. DOCKER

used to interact with the daemon. This can be done through a terminal on a systemwith the Docker command line tool installed. When using Docker’s run commandto start a container, the client sends the request to the Docker daemon which inturn handles the request. The client and the daemon do not have to be runningon the same system, hence the client can be connected to a Docker daemon thatis running remotely. Additionally, a client is not limited to communicating withonly one daemon. The interaction between the two can be via a REST API, UNIXsockets, or a network interface.

Figure 2.3: Docker containers on a host machine

Docker has three main network configurations. It is important to understandthem in order to choose the one most suitable for the desired execution environmentsince network overhead is one of Docker’s main sources of negative impacts onperformance. The configures are:

Bridge networking is the default configuration. It creates a private networkthat all containers on the host connect to, giving them the ability to communicatewith each other. Containers can receive external connections by using port mapping.

Host networking disables all network isolation and uses the host’s networkinterface, thus exposing containers to the public network. Running a container onport 80 in this configuration means its applications will be available on port 80 onthe host’s IP address. This configuration is faster than using bridge, but it comeswith increased security risk.

An overlay network configuration is used to create a distributed networkbetween several hosts by creating an overlay on top of the hosts networks. Dockerhandles the routing of packets to and from containers that are running on these

13


network’s hosts. Docker provides IP address management, service discovery, multi-host connectivity, encryption, and load balancing.

For this project it was decided to use the bridge network configuration. Thereasons for this choice were that this is the default configuration and it has the leastsecurity risks in a production system.

Docker uses bind mounts or volumes to store persistent data. Bind mountsare the more limited option, as they simply mount a file or a directory on thehost machine into the container. Bind mounts perform well but depend on thedirectory structure of the host machine. When using volumes a directory is createdinside Docker’s storage directory on the host machine. The content of the volumeis managed by Docker. Volumes do not increase the size of the containers using itand its existence is independent of the container’s life cycle. They are also easier tobackup and migrate and safer to share among containers.

In order to use Docker, Docker must be installed just as any other application.Docker is available for Microsoft’s Windows, Apple’s MacOS, and various Linuxdistributions. The container acts as an executable image which Docker runs. Thisimage is a package that should include the application to run as well as all ofthe dependencies needed to run it. A single image can be used to run manydifferent independent containers. The difference between a virtual machine andDocker should be clear in a production environment (see Figure 2.4). When usinga virtual machine, an OS must be installed, the desired software set up, and all thedependencies installed. A single virtual machine can run multiple applications.Virtual machines have been an industry standard for a long time. However,having to think about dependencies can prove difficult when trying to maintaina service. Moreover, sometimes a different version of software libraries are neededfor different services. Moreover, updating one library for one service may breakanother (unrelated) service. When using Docker, the service and its dependenciesare bundled together in an image. In order to avoid a potential dependency raceas may occur with a virtual machine setup, it is possible to run a single service ineach container. A single server can run many different applications each in theirown isolated container. When code changes need to be applied to an application orother libraries installed, a new image can be created. Additionally, this new imagecan replace the old container without impacting any other applications running onthe underlying physical system.

14

2.5. DOCKER

Figure 2.4: Virtual machine and container setups

Docker can also play a role in easing the whole process of developing softwarebecause if you can run a Docker image on one machine you can run it on anyother machine that has Docker installed. Usually there are multiple environmentsneeded in order to get software from the development environment to run in theproduction environment. In such a deployment process developers need to setup the dependencies themselves in order to try out their software. Moreover,a test environment needs to be set up in order to test the software and finally,as stated before, the production environment needs to be set up. When usingDocker, developers create their images and run their software on their own machines.Subsequently these images can be run in the test environment before being deployedinto production. In theory, Docker should be able to greatly reduce the operationoverhead of software development and deployment.

In order to deploy the containers in production they can be stored in a registry.A registry can have many repositories and each repository can contain multipleimages. After creating Docker images, these images are pushed to a registryof the developer’s choice and subsequently these images can be accessed by theproduction computers that will run the application(s). Deploying a container inproduction can be done using a single Docker command (as will be describedin the next section). Docker offers their own registry solution called DockerHub [https://hub.docker.com/]. This registry can host both public and privaterepositories, but enterprises may want to set up their own registry for securitypurposes.

The increased popularity of containers does not result in virtual machinesbecoming obsolete because the two virtualization techniques serve different pur-poses. In a production environment, virtual machines and containers tend to berun together. A virtual machine is set up, allocated the resources to run thedesired applications but rather than of setting up the applications and runningthem natively, Docker can be installed and the services run in containers. Thisprocess can reduce the run time overhead and greatly decreases the operations andmaintenance effort required for the production servers.

15


2.6 Using DockerTo create a Docker container a Dockerfile must first be created. Listing 2.1 showsthe Dockerfile for one of the services that was built in this project.

1 FROM mi c ro s o f t / aspnetcore : 2 . 0 AS base2 WORKDIR /app3 EXPOSE 8045 FROM mi c ro s o f t / aspnetcore −bu i ld : 2 . 0 AS bu i ld6 WORKDIR / s r c7 COPY UserServ i ce . s l n . /8 COPY UserServ i ce / UserServ i ce . c s p r o j UserServ i c e /9 RUN dotnet r e s t o r e −nowarn : msb3202 , nu1503

10 COPY . .11 WORKDIR / s r c / UserServ i ce12 RUN dotnet bu i ld −c Re lease −o /app1314 FROM bui ld AS pub l i sh15 RUN dotnet pub l i sh −c Re lease −o /app1617 FROM base AS f i n a l18 WORKDIR /app19 COPY −−from=pub l i sh /app .20 ENTRYPOINT [ " dotnet " , " UserServ i c e . d l l " ]

Listing 2.1: Dockerfile example

The container is based on the official Microsoft ASP.Net Core Version 2.0 Dockerimage. The source code for the service is copied from its local folder, built withinthe container and then the service is started. The EXPOSE command tells thecontainer to listen for TCP connections on port 80 giving external network accessto the service.

The container is built and tagged with a name chosen by the developer byexecuting a terminal command - as shown in listing 2.2

1 docker bu i ld −−tag NAME_OF_CONTAINER PATH_TO_DOCKERFILE

Listing 2.2: Docker build command

Listing 2.3 shows a terminal command to run the Docker container locally.1 docker run −d −p 80 :80 NAME_OF_CONTAINER

Listing 2.3: Docker run command

The -d is used to run the container as a background (daemon) process and the-p is used for port mapping. In listing 2.3, the host’s port 80 is mapped to exposedport 80 on the Docker host. As a result the service is available via localhost on port

16

2.7. ORCHESTRATION

80. Note that this service is limited to access via other processes also running onthis same host, as the bridge networking configuration was used.

2.7 Orchestration

As the size of the enterprise systems grow so does the importance of automation tomanage all of the virtual machines or containers. Virtual machines and containertechnology provide the means to run a reliable and scalable distributed system.However, in order to fully utilize these technologies, orchestration technology isneeded. The purpose of orchestration technology is to automate the systemadministration work needed to run a distributed system. This orchestrationtechnology should reduce the burden of manually administering the system and,ideally, do it better than any human could.

Automation has always been an important tool in a system administrator’sarsenal. The simplest form, writing scripts and setting up recurring jobs, can savea lot of manual labor in the long run. With time, automation tools have becomemore capable and complex. Paired with virtualization, servers can now be spawnedand setup automatically in a matter of minutes without any human interaction.

The emergence of containers takes things even further when building dynamicand robust systems. The focus shifts from running machines to runningapplications. This change in focus makes it easier to make changes to thesystem, such as setting up new hardware and upgrade operating systems, withoutimpacting already running applications. Developers and system administrators donot have to worry about machine and OS details since everything necessary for theservice runs in containers [8]. However, this increased complexity calls for a highlycapable orchestration tool. A system consisting of few virtual machines may berunning hundreds or thousands of containers. Currently there are many differentsolutions available, each with their own ideology. Some are built to run a systemcompletely on their own, while others are expected to be bundled together withother tools. Most orchestration tools offer declarative configuration, meaning thatoperators set the desired state of the system. It is then up to the orchestrationtool to match the current system stated to the desired state. The desired state isa combination of many requirements. For example, what applications need to runand where they need to run, how many replicas of each machine or service shouldbe instantiated, and how much resources an application is allowed to use. Table2.1 contains short descriptions of fundamental tasks that must be taken care of inorder to run a distributed containerized system and orchestrators can help with.

17


Table 2.1: Orchestrator tasks

Container management The most basic functionality. When theorchestrator senses the desired state does notmatch the current state it will automaticallyspawn or take down containers.

Naming and servicediscovery

When running a dynamic system with multi-ple services configuring them becomes moredifficult. The orchestrator should makeitpossible for applications to dynamically findand interact with other services.This can beimplemented with a lightweight DNS or adistributed key-value store.

Monitoring In order to keep the desired state theorchestrator must contin-uously monitor thesystem. Application health checks can be usedto make sureservices are running properly.When a service goes down for some extendedperiodthe container can be restarted or when ahost goes down it containers can be movedtoanother host. This self-healing capability isa fundamental part of a reliabledistributedsystem.

Application-centric loadbalancing

Can be used to make full use of thesystem’scapabilities. When running replicated servicesthe load is balanced between them.

2.8 Cloud computing

Virtualization has been one of the main driving forces for the establishment of cloudcomputing[9]. The concept of cloud computing can be described as a model thatenables users to gain access to a shared pool of computing resources. The resourcescan be of various nature, such as servers, storage or applications.

There are three major service models identified within cloud computing:Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as aService (IaaS). SaaS is most likely the most common service model. In this modelconsumers have access to an application that is running on a cloud infrastructure.The SaaS model enables software providers’ provide customers access to specificsoftware through various methods, such as web browsers and mobile applications.These consumers only access the applications and do not know about the underlyinginfrastructure where these applications are running. PaaS enables customers todeploy their own applications on a cloud provider’s infrastructure. The application

18

2.9. TECHNOLOGY STACK

owners avoid the hassle of setting up the infrastructure (such as servers andnetworks) and focus only on deploying their application. IaaS enables customers toprovision computing resources, such as virtual machines, storage, and network fortheir own usage. They do not have access to the underlying cloud infrastructurebut they are able to set up their own environments which are run on computers,networks, storage, etc. by the cloud provider.

There are four major deployment models. A private cloud is used by a singleorganization. It can be managed by the organization, a third party or a combinationof both, and it can be hosted on premise or at a third party’s location. A communitycloud is similar to the private cloud, but instead of a single organization being thecustomer there is a community of customers that use the cloud. Public clouds existon the cloud provider’s premises and can be used by the general public. Finally,hybrid clouds are a combination of the other deployment models.

The 2018 IaaS magic quadrant by Gartner [10] identifies Amazon Web Services,Microsoft Azure and Google Cloud as the current main cloud providers. They offervarious services to run systems which gives companies the option to avoid settingup and running their software on onpremise hosts. With the increased popularityof containerized microservices, these three cloud providers have also begun tooffer container services to provide their customers the option off deploying theircontainers straight into the cloud. This abstracts away the underlying infrastructurethat runs the containers; therefore facilitating the whole process. As stated before,containers are advertised as being fairly lightweight. However, these containersrunning in the cloud are actually running on top of virtual machines. For thisproject, virtual machines were used instead of a container service, but if there is anoticeable overhead measured when load testing applications in this way it shouldalso occur for a cloud container service as long as it uses virtual machines instead ofbare metal servers. Using bare metal servers would avoid the hypervisor overheadand therefore presumably reduce the overall overhead.

2.9 Technology stack

In order to better mimic a production system and increase the complexity of thelaboratory system, some technologies from the Betsson’s tech stack were chosen.The system itself is created using .NET Core, an open source implementation of thewidely used .NET Standard. Couchbase Community edition is used for persistentstorage and RabbitMQ for messaging. Docker was used to run RabbitMQ in itsown container and Couchbase was installed on a dedicated database server. Eachof these will be described below.

19


2.9.1 .Net Core

The .Net framework is essentially used for two things: (1) an execution engine torun applications on Windows and (2) a large library of reusable code for developersto use in their own applications. The .Net framework provides various servicesto the running applications and their developers such as memory management,a common type system, development frameworks, language interoperability andversion compatibility. In 2016, an open source implementation of the .Net Standardwas released, called .Net Core. This can run on various OSes, including Apple’sMacOS and some Linux distributions. The microservices were built using ASP.NETCore [11], which is a part of the .NET core stack. ASP.NET Core is primarily usedto create internet applications and it allows developers to easily create REST APIs.Kestrel, the default web server for running ASP.NET Core projects, was used torun the microservies for both the Docker and native solution.

2.9.2 Couchbase

Couchbase[12] is a commercial NoSQL (Not only SQL) database, marketed as anengagement database. It is built to be lightning fast, easily scalable and able tohold multiple copies of data entities for high availability and data safety. Couchbasecan store JSON documents and binary data. The data is stored in data containerscalled buckets which can be replicated up to three times in the Couchbase cluster.Multiple buckets can be created in the cluster for various purposes. Each data entityhas a unique Document ID which is used to decide what server(s) it shall be storedon. Couchbase offers a REST API for client applications to interact with, but alsooffers a software development kit (SDK) for various programming environments foreasy integration. In this project the SDK for C# is used. A community edition ofCouchbase is available and as stated earlier was used in this project. Even thoughthe clustering ability of Couchbase is one of its main strengths there will only beone instance running in the test environment.

2.9.3 RabbitMQ

RabbitMQ[13] is a lightweight open source message broker. It is written inErlang and was originally built to support AMQP[14] but the current versionsupports various other protocols such as STOMP[15], MQTT[16] and HTTP.Essentially, RabbitMQ acts as an intermediary between a message sender (producer)and message receiver (consumer). RabbitMQ can be used to implement variousmessaging patterns such as worker queues, publish/subscribe, routing, topics, andremote procedure calls (RPC). The SDK for C# was used for easy integration ofRabbitMQ into the system. The SDK is based upon AMQP 0-9-1 with additionalabstractions.

20

2.9. TECHNOLOGY STACK

2.9.4 TaurusIn order to create a workload on the system and analyze its performance for differentconfigurations a load testing tool was needed. This could be done with custom madecode but there are various open source load testing tools available. Taurus[17] waschosen for this project due to its easy configuration syntax and versatility. Taurus isnot in itself a load testing tool but works as a wrapper for other load and functionaltesting tools such as Apache JMeter[18], Gatling[19] and Selenium[20]. The testingscenarios are configured with .yaml files in which the underlying testing tool andtest parameters are configured. Taurus used JMeter as the load testing tool for allperformance tests in this project. An example Taurus configuratino file is shown inListing 2.4 while the command to start Taurus is shown in Listing 2.5.

1 execut ion :2 − concurrency : 103 ramp−up : 1m4 hold−f o r : 10m5 s c e n a r i o : l o g i n67 s c e n a r i o s :8 l o g i n :9 timeout : 500ms

10 k e e p a l i v e : f a l s e11 r e q u e s t s :12 − u r l : http :// l o c a l h o s t :5001/ api / user / l o g i n13 method : POST14 body :15 username : te s t_user16 password : test_password17 headers :18 Content−Type : a p p l i c a t i o n / j son

Listing 2.4: Example Taurus configuration file

1 bzt taurus−c o n f i g . yamlListing 2.5: Running Taurus test

Listing 2.4 shows an example Taurus configuration file. The main workloadparameters are concurrency, throughput, ramp-up and hold-for. Concurrency standsfor the target value of concurrent virtual users. Instead of using a target value forrequests per second, Taurus spawns concurrent users that interact with the testsystem in order to mimic a more realistic scenario. Throughput sets a limit to themaximum requests per second created. It can be used to establish a stable load ofrequests throughout the load test. The ramp-up parameter controls how quicklyTaurus will spawn the target number of concurrent users. This can be set to 0 tostart immediately with the full number of users but it can also be useful to slowlyramp up the number in order to find the breaking point when the system can nothandle the load. Hold-for controls for how long the test will run with the full numberof concurrent users.

21


The scenario shown in listing 2.4 will send HTTP POST requests to http://localhost:5001/api/user/login. The requests contain json data with two fields, usernameand password. Given that Taurus is installed on the tester’s computer and isaccessible via console, Taurus tests are run with a bzt command as shown in listing2.5. It will open up a display to show how the testing is going in real time. Oncethe test finishes a folder is created containing information about the test.

2.10 ZipkinUnderstanding the behavior of distributed systems can be very difficult. Zipkin[21]can help with this in a microservice system by collecting request time data fromthe running services and showing the results in a clear manner. Zipkin is basedon Google Dapper[22]. It was originally created by Twitter but the project is nowbeing handled voluntarily by the OpenZipkin organization. Services are configuredto send the timing data to Zipkin. When a request enters the system it may requiremultiple services to finish a workload before the response is sent back. Zipkin showshow long each service took to finish their workload for any given request. Theoverall and individual processing times are then shown in Zipkin’s (user interface)UI. This is useful when debugging latency problems because any bottlenecks canbe easily found. For this project the Zipkin4Net[23] package was used as it makesit fairly easy to configure services to send data to a running Zipkin instance.

2.11 NmonNmon[24] was used to conduct performance monitoring on the Linux servers runningthe microservices. Nmon can be installed on a Debian/Ubuntu system using thenormal apt-get install command as shown in Listing 2.6.

1 sudo apt−get i n s t a l l nmon

Listing 2.6: Installing nmon

Nmon is a powerful tool capable of showing various performance metricsregarding CPU, memory, network and disk usage. It can show information inreal-time but for this project the data used to save data in a comma separatedvalue (CSV) file while the load tests were running. Listing 2.7 shows how nmonis configured to save the state of the server every 30 seconds for 30 iterations ina file named docker_loadtest.nmon. This saved data can be analyzed to learnabout the state of the system, while handling underload from the experiments. Theresulting performance metrics can be compared for the different environments thatwere evaluated.

1 nmon −F docker_loadtest . nmon −s 30 −c 30

Listing 2.7: Starting nmon

22

2.12. PREVIOUS WORK

2.12 Previous Work

This section presents previous work that is related to the project. Studyingthis work was used to gain a better understanding of the subject. The focusof the thesis is to understand the performance penalties incurred by runningmicroservices using Docker. As this section shows, when compared to the monolithicapproach, performance penalties are expected due to the nature of containersand microservices. However, due to the complexity of such systems it can provedifficult to pinpoint exactly where the system is experiencing performance penalties.Multiple aspects must be considered, ranging from the performance of virtualizationtechniques used to the granularity of the microservice system. This section starts bypresenting work done to measure the performance of virtualization techniques, withthe focus being on Docker. It then moves on to microservices and microservicesrunning in containers.

2.12.1 Virtualization

Most direct performance studies follow the same methodology. Multiple benchmarkprograms are run on different setups in order to show how virtualization comparesto bare-metal performance. E. Casalicchio and V. Perciballi [25] researched toolsto measure the workload generated by containerized applications with regards toCPU and I/O performances and showed the importance of carefully interpretingthe results from monitoring tools, as different tools showed different results thatare correct per se. For example, Docker’s stream of resource usage statistics forcontainers was found to show the requested CPU resources by containers while othertools showed the actual resource usage on the host system. Such differences showedthat fully understanding the monitoring tools being used was of great importance.

Felter, et al.[2] ran several benchmarks on a single bare-metal machine, KVMand Docker setups and compared their performances. The tests measured workloadmetrics for scenarios where one or more hardware resources were fully utilized.Their results showed that Docker and KVM introduce negligible overhead for CPUand memory intensive workloads. However, Docker and KVM did not perform aswell as the bare-metal machine when it came to I/O intensive workloads. Docker’sNAT was also shown to have a negative performance impact on workloads with highpacket rates.

Li, et al[26] performed performance evaluations on a standalone Docker containerrunning on a physical machine and compared this against a standalone virtualmachine running on VMWare Workstation 12 Pro. Similar to earlier findings, theirexperiments showed that modern virtualization techniques usually have minimalperformance overhead. A key takeaway is that the overhead varies not only on afeature-by-feature basis, but also on job-to-job basis.

23


Sharma, et al.[27] experimented with virtualization technologies in a data centerenvironment and compared the performance of containers and virtual machineswhen more than one application was running on each physical server. Thebaseline results are similar to other findings, as there is a small overhead forCPU and memory operations (as they do not go through the hypervisor) whilethe performance of I/O intensive applications was poor. In multi-tenant situations,where there are multiple applications competing for resources on their physical host,containers experience a higher performance penalty due to interference from othertenants. This can potentially be reduced with strategic placement of containers.

J. Shetty, et al.[28] showed that Docker and bare-metal performance are similarwhile virtual machines always have poorer performance. A noteworthy observationis how severe the virtual machine overhead was for write operations (54% lessthroughput while Docker’s overhead was 13%). These authors also showed howthe higher network latency of virtual machines negatively impacts the performanceof HTTP servers. In particular, the Openstack server had 32% lower throughputwhen running on Docker than the bare-metal server.

However, despite the evidence of performance penalties, running containers ontop of virtual machines is common practice in enterprise environments, hence this isthe main focus of this thesis. However, little literature is available on this subject.Karatza and Mavridis [29] investigated how running Docker on top of KVM affectedthe containers’ performances by running several resource focused benchmarks. Theyshowed that the extra virtualization layer brings expected performance penalties forall main resources, i.e. CPU, memory, disk and network interface.

Amazon Web Service offers the possibility to both deploy and run services in thecloud. They offer both virtual machines and containers. However, their containerbased services run on top of virtual machines. By comparing the same servicesrunning on AWS, both on a VM and in containers, Salah, et al.[30] showed that whilecontainers may have less overhead on a bare-metal machine, running containers ontop of VMs has a clear negative impact on performance. Their research analyzedthroughput, response time, and CPU utilization and showed that while the containersolution had a better CPU utilization, it clearly suffered in terms of a lower rate ofrequest handling.

Although there seems to be a consensus that containers have less of a perfor-mance impact than virtual machines, this is not always the case as Rosenberg[31]showed in a whitepaper for VMWare. Due to VMWare’s vSphere 6.5 sophisticatedresource scheduling, running the benchmark applications on VMs was shown tohave the highest performance, outclassing even bare-metal servers. This whitepapershowed that running Docker on VMs proved to be better than on the bare-metalservers. The Docker and VM combination performed worse than VMs withoutDocker because of the overhead added to storage and network stacks.

24

2.12. PREVIOUS WORK

2.12.2 Microservices

Apart from the virtualization overhead one expects additional overhead due to themicroservice architecture. Instead of having a single process responsible for handlinga request there may be multiple services working together to create a response.This additional communication between the services should not be neglected. Theliterature on microservices is all very recent. P. D. Francesco, I. Malavolta, and P.Lago [32] give an overview of the state of the literature regarding microservices byidentifying, classifying and evaluating the current state of the art in architectingmicroservices. They showed that research in the field has been steadily increasingsince 2014, but up until then not much had been written. More than half of thestudies that they found focusd on the service layer of microservices while containersand VMs combined make up less than halfof the literature. As for quality attributes,the main focus was on performance efficiency as it includes scalability which is oneof the main reasons for using microservice architecture.

The field of application performance management (APM) [33] aims to provideprocesses and tools to measure the performance of systems and help detect and solveperformance related issues. In short, APM revolves around four activities: collectingperformance measures form different parts of the system, storing and processingthe data, presenting and finally interpret it. With the increasing popularity ofmicroservices the community has started to focus on how to conduct APM for suchsystems. However, this area is still in beginning stages [34]. Due to the complexityof microservices systems and the fast pace of changes, older state of the art APMmethods simply can not be re-used.

There is not much literature available about standardized performance evalu-ations for microservices. M. Gribaudo, M. Iacono, and D. Manini [35] proposedone approach that allows developers to model microservice architectures and usean aggregated heterogeneous tunable workload mix to study their performance.However, this approach does not take container orchestrators into the account,which should be a standard part of any larger microservice system.

Ueda, Nakaike and Ohara [36] investigated the performance impact of switchingfrom a monolithic approach to microservices and showed that this change had anegative performance impact of up to 79%. Their experiments were done usingmicroservices in Docker containers so the overhead comes from the combinationof both the microservice architecture and virtualization. Docker’s default networkconfiguration, which uses a bridge to virtualize the network was shown to havean negative impact of up to 24.8% on network throughput when compared to anonvirtualized network stack. The default Docker configuration also uses iptableswhich naturally grow in size as the number of containers grows. This results in anincreased rate of CPU data cache misses, because the ipables entries have to bescanned for every network operation to see if the operation should be permitted.

25


Libraries that handle the communication between services were also found toconsume significantly more CPU cycles in the microservice model.

Focusing further on the network performance of Docker, a study that useda simple architecture for two services, using HTTP as communication protocol,revealed a rough 20% performance drop for messages smaller than 100 kBytes and10% performance drop for larger messages[37]. A non-negligible performance dropfor such a simple system should indicate how important it is to understand themicroservice architecture and its pitfalls.

Focusing on containerized microservices, M. Amaral, et al.[38] investigated thedifference between a master-slave setup, where all containers run as peers of eachother with one parent container which acts as a manager, and a nested containersetup, where all service containers are spawned within a privileged container. Theirwork showed that nested containers have a negligible impact on CPU performance,but have poor network performance. As other studies have shown, there was alwayssome overhead when the container setups are compared to bare-metal.

2.13 SummaryContainers and more specifically Docker, are promoted as being fairly lightweight.Much work has been done that supports those claims and multiple resources areavailable which contain various performance studies on containers, VMs, and bare-metal. However there appears to be a gap in research on containers’ impact onlarger systems that utilize a microservice architecture.

Comparisons between containers and VMs have shown that containers introduceslightly less overhead than the hypervisor of VMs. This should not come as asurprise because a container is just a group of processes on the host machine, withisolated namespaces and no hardware virtualization involved. It is important toknow what kind of overhead can actually be expected because it will become aconstant in the system, i.e. virtualization will always be running underneath. Overthe years, virtualization techniques have been constantly improving and in manycases their overhead has become quite minimal.

26

Chapter 3

Methodology

This chapter explains how the experiments were performed for the degree project.It starts by introducing the system that was specifically created for this experiment.It then covers the different ways of running the system, how the experimental runswere performed and the key metrics that were gathered.

3.1 The Test SystemA modern microservice system can require multiple services to handle a singlerequest, thus the initial service handling a request may depend on other servicesthat in return depend on other services and so on. In order to mimic this behaviorthree services were created:

• User service handles a login request from a user. Depends on a blacklistservice to validate the incoming request.

• Blacklist service checks if an IP address or the country that the IP addressbelongs to is on a blacklist. The service depends on a geolocation service.

• Geolocation service Returns a country name based on an IP address.

Figure 3.1 gives an overview of how all the components of the test systeminteract with each other. The three services all need to retrieve data from aCouchbase instance. They also all register with the Zipkin service so that requestscan be traced. In order to mimic the production system, upon a successful logina messages with information about the login is sent to a RabbitMQ queue. Aconsumer is actively waiting to retrieve the message. Because this workload isasynchronous and does not affect the response time of the initial login request, theactions performed by the consumer are trivial and its main purpose in this scenariois mainly to remove messages from the queue. For further analysis, the services

27

CHAPTER 3. METHODOLOGY

also send time measurements for various tasks they have performed to RabbitMQ.The measurements are gathered by a consumer after each load test and the keyperformance metrics for the services’ tasks are calculated, i.e., the median valueand the 90th, 95th, and 99th percentile values. The timing measurements werespecifically added to gain further insights into where potential overhead comes fromand would not be a part of a production system. The overhead of the timingmeasurements is minimal and as it impacts the system during all the test scenariosit was deemed unnecessary to pay special attention to it. The process requires C#’sStopwatch methods and sending a single message to the RabbitMQ queue for eachrequest handled by a service.

Figure 3.1: Overview of the test system

User serviceThe processing begins when a user service receives a HTTP POST requestcontaining a username and password. It first validates the input data to see ifit is in the right format before proceeding. It retrieves the IP address of the clientand sends an HTTP GET request to the blacklist service to see if the client’s IPaddress should be disallowed service. If the blacklist service’s response is positivethat the user is permitted to receive service, then the user service will retrieve thecorresponding user object from the database. The user service checks whether theuser is enabled or not and then it creates an SHA512 hash of the client’s passwordattempt and compares to the hashed password stored in the database for this user.If everything is correct so far, the service creates a JSON Web Token and returnsit in the response to the client’s request.

Blacklist serviceThe blacklist service receives an HTTP GET request for a specific IP address. Itchecks whether or not the IP address has been blocked by retrieving a list of blockedIP addresses from the database and checking if it contains the client’s address. If

28

3.2. SYSTEM SETUP

this address is not blocked, then it will send a HTTP GET request to the geolocationservice in order to find out the origin country of the IP address. Once it has thiscountry it will retrieve a list containing blocked countries from the database and seeif this country is in the list. It will then respond with a Boolean response indicatingwhether or not the access to the service should be blocked based upon the sourceIP address of the user’s request.

Geolocation serviceWhen the geolocation service receives an HTTP GET request containing an IPaddress it retrieves an IP-information object from the database. It then returns thecountry associated with that IP address.

The scheduling policy of processes can affect the performance of a system as thescheduler decides which process will be run by the CPU next. During a high loadon the system there is a chance that the performance of the system suffers from toomany context switches, i.e., jobs keep getting interrupted before finishing. For thisproject the service processes, with and without Docker, used the default universaltime-sharing scheduler policy with normal priority.

3.2 System setupAmazon Web Services (AWS)[39] was used to deploy and run the test system. WithAWS, EC2 instances (i.e., virtual servers) can be created quickly and easily. Allserver instances used were of the type m4.xlarge. The specifications for this type ofserver instance are shown in table 3.1.

Table 3.1: m4.xlarge instance specification.

Instance type m4.xlargevCPU1 4Memory 16 GiB

SSD Storage 30GB EBS-onlyBandwith 750 Mbps

The test scenarios required two different environment setups. One where allthe services ran on the same server and one where the services ran on multipleservers, i.e., each application server ran a single service. Two servers did not changebetween the scenarios, these were a server running Couchbase and a server runningRabbitMQ and Zipkin, within Docker, and consumers for the RabbitMQ queue.Figure 3.3 and Figure 3.2 show the two system deployments.

1A vCPU is a AWS virtual CPU an is roughly equivalent to a single hyperthread of the CPUthat the instance model uses. M4 instances use 2.3 GHz Intel Xeon® E5-2686 v4 or 2.4 GHz IntelXeon® E5-2676 v3 processors.

29


Figure 3.2: System architecture with services running on multiple servers.

Figure 3.3: System architecture with services running on a single server.

3.3 Test scenariosIn order to measure any potential container overhead, four different ways of runningthe services were identified as scenarios of interest:

1. All services are running in a Docker container but on separate virtual servers.

2. All services are running on separate virtual servers using only Kestrel.

3. All services are running in a Docker container on the same virtual server.

30

3.3. TEST SCENARIOS

4. All services are running on the same virtual server using only Kestrel.

Not using Docker the environment mimics the ’old’ and still widely used way ofrunning web applications. In this case each server was set up from scratch and allthe .NET Core dependencies required to run the application were installed. TheDocker scenario only required Docker to be installed and then the correct Dockerimages were fetched and started.

Running the services on the same server, with and without Docker, is an attemptto reduce the communication overhead. As it is known that some services dependheavily on others, reducing their communication latency should reduce the overhead.The Docker services all used the same network bridge and communicated throughit. The services that ran directly on the virtual server interacted with each otherby using localhost.

The response times, reported by Taurus, for all the created requests werecollected and used to judge the performance of the system. For every load testthe median value was found as well as the 95th, 99th and 99th percentile values.The Taurus load test was configured to send POST requests to the user servicewith the login credentials in a JSON payload. A throughput value was used in theTaurus configuration file in order to establish the load. The first load test was setto send 50 requests per second to the system. The throughput was incremented by50 requests per second between load tests until the system reached its maximumcapacity and was not able to handle the load causing response times to incrementdrastically. Each load test was configured to run for 17 minutes. The requests madeduring the first minute and the last minute of the load test were discarded for theperformance calculations in order to only use request times from when the systemis in a stable state and fully loaded. The full load test configuration file can be seenin listing 3.1.

31


1 execut ion :2 − concurrency : 100003 hold−for : 17m4 throughput : 505 s c e n a r i o : login67 s c e n a r i o s :8 login :9 t imeout : 1m

10 k e e p a l i v e : fa l se11 r e q u e s t s :12 − u r l : http :// l o c a l h o s t :5001/ api / user / login13 method : POST14 body :15 username : te s t_user16 password : test_password17 headers :18 Content−Type : a p p l i c a t i o n / j son19 r e p o r t i n g :20 − moduel : c on so l e21 − module : f i n a l −s t a t s22 summary : true # o v e r a l l samples count and percent o f f a i l u r e s23 p e r c e n t i l e s : true # d i s p l a y average t imes and p e r c e n t i l e s24 f a i l e d −l a b e l s : true # prov ide s l i s t o f sample l a b e l s wi th

f a i l u r e s25 test−durat ion : true # prov ide s t e s t dura t ion26 dump−csv : l o g i n _ s t a t s . csv

Listing 3.1: Taurus load test file

3.4 Measurements

Several different measurements were gathered during each load test. The serviceswere built to measure the time it took to handle all potentially time consumingactions, such as retrieving data from the database and the overall time it tookthe finish a request. This was done by using C#’s stopwatch class. The timingmeasurements were sent to RabbitMQ for processing after all of the requests werecompleted.

One minute before every load test began, nmon was started to create snapshotson the service servers every 30 seconds. By starting nmon before the load test beganand letting it run for a short while after the load test finished, it was easy to analyzehow the state of the system changed from being idle to handling the maximum load.

As stated earlier, Zipkin was used to collect information on how long each requesttook and how long each service took to finish their part of the request. Zipkin hasa REST API that makes it possible to get information about requests in a JSONformat. Once a load test finished all of the collected data was retrieved from Zipkin.

32

3.4. MEASUREMENTS

Finally, the Taurus results were saved in a CSV file from which the main results(such as total number of requests and the average response time) of the load testcould be viewed. In summary, the following steps were taken for each load test:

1. Nmon was started to collect data on the system’s performance every 30 secondsfor 20 minutes.

2. Taurus started load test with a one minute ramp-up time and ten minuteslong maximum load. Once the test finished the results were stored in a CSVfile.

3. Zipkin data was collected using Zipkin’s REST API.

4. The time measurements from the services were retrieved from RabbitMQ.

33

Chapter 4

Results

This chapter presents the results from the load testing that was described in Chapter3. The main metrics are presented followed by other benchmark results to helpunderstand the results.

4.1 Taurus results

Taurus was used to create the load on the system. Taurus saves information for everyrequest it creates during a load test in a kpi.jtl file, which includes the elapsed timefor each request. The time values from each load test were used to calculate the keyperformance indicators; the median and the 90th, 95th and 99th percentile values.The requests from the first and the last minute of the load test were discarded inorder to only use results when the system is in a steady state. In the beginning thesystem is unloaded and therefor able to handle the initial requests faster and in theend there are no more requests coming in resulting in a lighter and lighter load onthe system.

At a certain point the system was bound to reach its maximum capacity andrequest times increased severely. Once that point was reached the results from loadtests with a higher load did not produce usable results because the load test itselfbecame a limiting factor. The request times became too long and the load testreached its maximum concurrency level while waiting for the responses and was notable to create the desired RPS load over the span of the whole test.

4.1.1 Single server setup

Figure 4.1 shows how the service time spike happened when the load was increasedfrom 100 to 125 RPS when Docker was used. The native solution managed to handlemore load and the increase in service times happened when the load was set to 150RPS.

35

CHAPTER 4. RESULTS

Figure 4.1: Log values of the median response times when using a single server

Table 4.1 and Table 4.2 show the load test results when the system ran on asingle server. Similar to the previous results, the native solution outperformed theDocker solution for every RPS value that was tested.

Table 4.1: Taurus load testing results for a single server with Docker

RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)50 34 88 115 216100 35 175 408 1043125 100 663 101 566 101 718 102 027150 104 176 104 828 105 007 105 712

Table 4.2: Taurus load testing results for a single server without Docker

RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)50 27 54 71 204100 22 32 40 208125 21 94 150 684150 82 026 83 579 83 862 84 512

Figure 4.2 and Figure 4.3 show the empirical cumulative distribution functions(CDFs) for service times from the single server setup when using RPS values from 50to 200. The x-axis represents the response times and the y-axis shows the amountof samples which have these response times (based on the service times gatheredfrom the load tests). In the case of the 200 RPS load test, the load rocked up anddown when trying to stay at 200 RPS due to the long service times, as explainedearlier.

36

4.1. TAURUS RESULTS

(a) 50 RPS (b) 100 RPS

(c) 125 RPS (d) 150 RPS

Figure 4.2: CDFs of request times when using a single server with Docker

(a) 50 RPS (b) 100 RPS

(c) 125 RPS (d) 150 RPS

Figure 4.3: CDFs of request times when using a single server without Docker

37

CHAPTER 4. RESULTS

4.1.2 Multiple servers setup

The multiple server setup was able to handle more load than when using a singleserver. Figure 4.4 shows the increase in median values of the request times for thesystem when using multiple (3) servers. Increasing the RPS value from 200 to 250caused a large increase in service times for the Docker solution. The native solutionwas able to handle more load and the spike happened when the RPS value increasedfrom 250 to 300.

Figure 4.4: Log values of the median response times when using multiple servers

Table 4.3 and Table 4.4 show the results for the system when it ran on multipleservers, with and without Docker. The native solution outperformed the Dockersolution for every RPS value tested. As stated earlier, the Docker system started tostruggle when it was hit with 250 requests. The native solution managed to handlethe load up until 300 requests per second.

Table 4.3: Taurus load testing results for multiple servers with Docker

RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)50 25 52 72 125100 14 18 21 65150 17 37 76 147200 26 126 169 290225 96 222 277 838250 49 667 51 751 52 042 52 531

38

4.1. TAURUS RESULTS

Table 4.4: Taurus load testing results for multiple servers without Docker

RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)50 23 47 65 136100 11 16 30 72150 13 25 53 105200 16 31 56 177250 28 170 223 654275 168 449 624 1252300 42 684 44 951 45 924 47 040

Figure 4.5 and Figure 4.6 show the empirical CDFs for service times for the lastfour load tests before reaching maximum capacity.

(a) 150 RPS (b) 200 RPS

(c) 225 RPS (d) 250 RPS

Figure 4.5: CDFs of request times when using multiple servers with Docker

39

CHAPTER 4. RESULTS

(a) 200 RPS (b) 250 RPS

(c) 275 RPS (d) 300 RPS

Figure 4.6: CDFs of request times when using multiple servers without Docker

Woverhead = Wdocker − Wnative

Wnative(4.1)

In order to calculate the overhead, Equation 4.1 was used for the median valuesfrom the load tests. Tables 4.5 and 4.6 show the Docker related overhead for bothsystem setups where comparison was possible.

Table 4.5: Docker related overhead for a single server

RPS Overhead (%)50 25.93100 59.09125 479 247.62150 27.00

40

4.2. ZIPKIN

Table 4.6: Docker related overhead for multiple servers

RPS Overhead (%)50 8.70100 27.27150 30.77200 62.50250 177 282.14

The overall performance of the system was better in every case when comparedto the single server setup. It is important to note that the hardware specificationdid not change between scenarios, i.e., a better server was not used for the singleserver load tests. Using three servers did of course result in the cloud cost beingthree times higher in regards of service servers.

Unexpectedly, in all cases the system seemed to perform worse during the loadtests with 50 RPS than with 100 RPS. A potential explanation for that is thatthe scheduler of the Linux system uses a 100 Hz clock and during the light loadthere are things that could bu run, but are not run because the scheduler has notscheduled the process to run. When the load is 100 RPS there is a new event withevery clock tick and the scheduler runs. Events to be processed by different tasksdo not have to spend as much time waiting to be run by the scheduler.

4.2 Zipkin

Zipkin was used to gain further understanding on the performance of the system,with and without Docker. The sample size was set to 25% in order to not overloadthe Zipkin server during the heavier load tests. While the Taurus results give anperformance overview from outside the system, Zipkin provides time measurementsfrom within the system, i.e., how long each request took based on measurementsperformed by the services themselves. Similar to the Taurus results, the requestsmade during the first and last minute of the load test were discarded and the median,90th percentile, 95th percentile, and 99th percentile values calculated. Table 4.7and Table 4.8 show the Zipkin results for the user service, Table 4.9 and Table 4.10for the blacklist service, and Table 4.11 and Table 4.12 show the geolocation serviceresults.

41

CHAPTER 4. RESULTS

Table 4.7: Zipkin trace results for the User service when using a single server

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 28.47 73.34 97.37 168.27Native 50 22.91 47.07 60.94 114.60Docker 100 29.46 125.51 293.40 741.68Native 100 19.37 30.15 36.38 132.35Docker 125 26 153.15 76 224.29 76 891.30 77 816.20Native 125 18.65 71.51 116.10 397.43Docker 150 29 042.59 79 405.53 80 416.71 81 338.65Native 150 22 176.04 62 838.71 63 406.37 64 211.30

Table 4.8: Zipkin trace results for the User service when using multiple servers

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 20.10 40.32 49.44 81.21Native 50 18.48 36.37 45.47 81.21Docker 100 11.87 16.44 18.81 49.70Native 100 10.32 13.97 23.09 53.31Docker 150 14.24 28.16 51.58 112.99Native 150 11.27 19.72 36.72 77.57Docker 200 21.10 83.99 127.94 221.83Native 200 13.54 25.72 41.00 133.01Docker 225 49.65 163.85 205.97 539.39Docker 250 12 648.13 38 722.63 39 174.88 39 733.49Native 250 23.01 115.76 165.24 373.58Native 275 100.84 342.82 476.89 856.55Native 300 15 775.26 33 301.45 34 293.66 35 933.01

Table 4.9: Zipkin trace results for the Blacklist service when using a single server

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 12.23 30.92 38.93 60.78Native 50 11.05 25.13 32.52 54.18Docker 100 12.30 26.76 35.43 75.75Native 100 8.74 16.38 19.94 44.74Docker 125 22.30 61.37 112.01 368.95Native 125 8.46 27.01 41.86 97.80Docker 150 25.83 106.90 208.14 617.92Native 150 84.53 407.07 601.45 1095.99

Table 4.11: Zipkin trace results for the Geolocation service when using a singleserver

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 2.96 4.88 6.12 10.89Native 50 2.72 4.85 6.03 11.23Docker 100 2.83 3.88 4.50 8.28Native 100 2.20 2.78 3.08 4.47Docker 125 3.00 4.58 5.91 13.06Native 125 2.13 2.98 3.63 6.33Docker 150 3.12 4.90 6.45 13.78Native 150 2.71 4.55 5.68 11.68

42

4.2. ZIPKIN

Table 4.10: Zipkin trace results for the Blacklist service when usingmultiple servers

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 8.95 20.60 25.67 41.62Native 50 8.31 18.71 23.54 40.72Docker 100 6.08 8.01 9.41 20.02Native 100 5.49 6.93 9.84 25.05Docker 150 6.61 12.18 17.77 51.41Native 150 5.7 9.33 13.89 37.77Docker 200 8.41 21.13 36.82 109.15Native 200 6.11 11.96 15.86 49.30Docker 225 10.70 35.44 67.07 149.55Docker 250 20.33 132.43 199.29 397.43Native 250 8.381 26.90 47.74 116.93Native 275 12.56 72.36 119.88 262.76Native 300 66.05 1073.37 2403.54 4556.71

Table 4.12: Zipkin trace results for the Geolocation service when using multipleservers

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 2.77 4.86 6.23 10.28Native 50 2.76 4.56 5.77 11.03Docker 100 2.23 2.69 2.91 4.17Native 100 2.00 2.48 2.75 4.73Docker 150 2.25 2.91 3.34 6.72Native 150 2.09 2.59 2.92 6.19Docker 200 2.42 3.39 4.57 11.12Native 200 2.16 2.86 3.53 9.58Docker 225 2.53 3.85 6.05 12.12Docker 250 2.56 4.41 7.50 14.18Native 250 2.30 3.66 6.17 13.37Native 275 2.21 4.27 7.61 13.76Native 300 2.49 6.31 9.76 16.87

As expected, based on the Taurus results, the Zipkin data showed that themedian response time for every scenario was higher when Docker was used. Thelargest performance difference was between the service times of the user service,which is the service that initiates the trace and depends on the other two services.

The geolocation service only serves one purpose. It retrieves information aboutan IP address and returns the country it belongs to. It does not depend on anyother service. As can be seen in Table 4.11 and Table 4.12, increasing the RPSvalue between the load tests did not impact the service times in the same way itdid for the other two services. The number of different IP addresses to retrievefrom the database is limited. Once there are enough different requests, all the data

43

CHAPTER 4. RESULTS

should be stored in cache which decreases the time taken to perform the lookup.The median values for the geolocation service are very small, giving the impressionthat Docker has minimal impact. The difference between the median values wasat worst 25.04% for the single server setup and 11.35% for the multiple serverssetup.

4.3 SummaryUsing Docker to run the microservice system had an obvious performance impact.The system performed worse on every load test for the Docker setup when comparedto the native setup. The overhead was at its minimum when the system was underthe least load and may be considered somewhat acceptable. Comparison of theservice time median values, provided by Taurus, showed that at its minimum theoverhead of using Docker was 8.7% and at its maximum it was 62.5% for themultiple server system. The same comparison for the single server setup showedthe overhead to be 24.92% at its minimum and increasing up to 59.09%. Thesecalculations exclude the extreme cases when the Docker system had reached itmaximum capacity but the native system had not. In those cases the overheadwas 479247.62% for the single server setup and 177282.14% for the multipleservers setup. It is therefore clear that using Docker to run a microservice systemhas a negative impact on the performance and as the load on the system increases,so does the Docker related overhead.

44

Chapter 5

Discussions

This chapter presents a further discussion of the findings presented in the previouschapter with the goal of understanding the origin of the performance difference.

5.1 Time measurements

Strategic timing measurements were performed to gain further insights into theprocessing times. The measurements showed that the main delays in the systemhappened when a service had to wait for a response from another service and thatthe simple non-network related logic of each service did not matter as much. Thiscould be due to multiple factors with regards to network communication.

By measuring the time a HTTP call from one service to the next took asperceived by the caller and comparing it to the how long the responding service tookto complete the request, it was possible to see how much overhead was incurred bycalling the service rather than locally implementing the service within a monolith.

There were two cases where a service had to call another service. The user servicecalls the blacklist service and the blacklist service calls the geolocation service. Table5.1 shows the measured waiting time when from the user service’s perspective whena single server was used. Table 5.2 shows the measured processing time for theblacklist service requests when using a single server. Table 5.3 and Table 5.4 showthe same measurements for the multiple servers setup.

45

CHAPTER 5. DISCUSSIONS

Table 5.1: Measured waiting time for the blacklist service when using a singleserver

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 24 84 144 144Native 50 19 54 104 104Docker 100 25 313 635 635Native 100 16 33 142 142Docker 125 49 676 51 979 52 271 52 767Native 125 15 69 113 421Docker 150 51 767 55 585.6 56 106.72 56 106.72Native 150 40 661 43 291 44 151.13 44 151.13

Table 5.2: Measured request time for the blacklist service when using a singleserver

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 12 29.6 37 60.56Native 50 10 24 31 54Docker 100 11 25 35 102Native 100 8 15 19 54Docker 125 22 71 182 476Native 125 7 28 46 126Docker 150 26 159 344.85 717Native 150 132 594 816 1271

Table 5.3: Measured waiting time for the blacklist service when using multipleservers

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 17 41 71 71Native 50 15 39 77 77Docker 100 8 14 45 45Native 100 7 19 45 45Docker 150 11 48 106 106Native 150 8 37 74 74Docker 200 17 121 216 216Native 200 11 36 127 127Docker 225 56 144 197 545Docker 250 24 680 26 446 26 829 26 829Native 250 20 154 385 385Native 275 97 285 418 702Native 300 21 540 24 478 25 546 25 546

46

5.1. TIME MEASUREMENTS

Table 5.4: Measured request time for the blacklist service when using multipleservers

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 8 20 25 47Native 50 8 18 24 54Docker 100 4 6 8 25Native 100 4 5 9 26Docker 150 5 11 20 59Native 150 4 8 17 43Docker 200 7 23 56 124Native 200 5 11 15 69Docker 225 10 53 95 201Docker 250 24 191 256 476Native 250 8 35 70 147Native 275 12 109 158.0 318Native 300 111 2212 3442.75 6895

Comparison of the measurements for the blacklist service shows that there canbe a large difference in how long the user service has to wait for a response and howlong it takes the blacklist service to actually handle the request. The waiting timeis incurred because of the microservice architecture, but the waiting time is alsolonger in every case where Docker is used compared to the non-Docker setup. Asmentioned earlier, the waiting time was the main reason for requests taking long.

Table 5.5 and Table 5.6 show the measured times when the blacklist service iswaiting for the geolocation service to respond. The waiting time was longer for theDocker setup except for one case, when the single server system was loaded with 150requests per second. The measured processing times from the geolocation servicewere significantly lower than the waiting times, with median values around 1 msand the highest 99th percentile value from all the scenarios being only 9 ms.

Table 5.5: Measured waiting time for the geolocation service request time whenusing a single server

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 10 25 32 49Native 50 9 22 30 49Docker 100 9 22 30 66Native 100 7 14 17 41Docker 125 18 53 109 270Native 125 6 25 39 87Docker 150 22 97 191 390Native 150 86 323 440 698

47


Table 5.6: Measured waiting time for the geolocation service when using multipleservers

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 8 20 25 47Native 50 8 18 24 54Docker 100 4 6 8 25Native 100 4 5 9 26Docker 150 5 11 20 59Native 150 4 8 17 43Docker 200 7 23 56 124Native 200 5 11 15 69Docker 225 8 34 55 112Docker 250 24 191 256 476Native 250 8 35 70 147Native 275 10 62 90 171Native 300 111 2212 3442.75 6895

5.2 Network calls

In order to analyze the network cost further, and the Docker related overhead inregards of network cost, a time measurement using timestamps from the serviceswas added and a second round of load tests were performed. When a service callsanother service it stores the timestamp for when the call was made. When a servicereceives a request from another service it also stores the timestamp for the time ofreceiving the request. The timestamps are sent as a part of the time measurementmessage to RabbitMQ. In order to calculate the elapsed time for each call, thereceiving timestamp was subtracted from the request timestamp. Chrony[40] wasused the synchronize the clocks of the servers.

Tables 5.7 and 5.8 show the results for the network calls from the user service tothe blacklist service and Tables 5.9 and 5.10 show the results for the network callsfrom the blacklist service to the geolocation service.

Table 5.7: Measured network request time for blacklist service when using a singleserver

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 6 24 33 57Native 50 4 13 18 36Docker 100 2 16 25 42Native 100 2 8 9 22.48Docker 125 21 54 80 184Native 125 3 14 25 55Docker 150 24 64 92.20 200Native 150 41 164 245 405

48

5.2. NETWORK CALLS

Table 5.8: Measured network request time for blacklist service when usingmultipleservers

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 3 9 11 17Native 50 2 8 10 15Docker 100 1 4 6 11Native 100 1 1 3 7Docker 150 2 8 10 23Native 150 1 5 7 16Docker 200 4 13 18 38Native 200 1 8 10 24Docker 225 6 18 26 54Docker 250 15 105 404 743Native 250 4 15 21 42Native 275 7 25 36 68Native 300 15 64 92 181

Table 5.9: Measured network request time for geolocation service when using asingle server

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 5 16 21 34Native 50 4 14 20 34Docker 100 1 11 16 29Native 100 2 7 9 20Docker 125 10 25 32 48Native 125 2 11 20 43Docker 150 10 28 35 52Native 150 25 66 86 149

Table 5.10: Measured network request time for geolocation service when usingmultiple servers

Scenario RPS Median (ms) 90th percentile (ms) 95th percentile (ms) 99th percentile (ms)Docker 50 3 10 13 22Native 50 2 8 11 18Docker 100 1 2 4 13Native 100 1 1 1 9Docker 150 1 7 10 50Native 150 1 3 5 11Docker 200 2 9 12 23Native 200 1 6 8 13Docker 225 3 10 12 20Docker 250 6 58 84 139Native 250 2 9 12 19Native 275 3 12 15 24Native 300 5 13 17 26

49


The Docker related overhead is shown in Tables 5.11 and 5.12. The networkcalls took shorter time when using Docker during the 150 RPS load test on a singleserver setup. Both the native and Docker systems reached their maximum capacityunder that load. The median value was also lower for Docker when the blacklistservice called the geolocation service on a single server setup under 100 RPS load.In all other cases, the native solution performed as well or better than the Dockersolution.

Table 5.11: Measured Docker overhead of network calls to the blacklist service

Setup RPS Overhead (%)Single server 50 50Multiple servers 50 50Single server 100 0Multiple servers 100 0Single server 125 600Single server 150 −41.46Multiple servers 150 100Multiple servers 200 300Multiple servers 250 275

Table 5.12: Measured Docker overhead of network calls to the geolocation service

Setup RPS Overhead (%)Single server 50 25Multiple servers 50 50Single server 100 −50Multiple servers 100 0Single server 125 400Single server 150 −60Multiple servers 150 25Multiple servers 200 100Multiple servers 250 200

5.3 Nmon

Nmon was used to monitor the state of the system during the load tests. Due to thenature of the method of testing that was used, the results do not necessarily providemuch insight unless something is very wrong. The data gathered showed that diskwrites and RAM usage were far from being a limiting factor in all of the scenarios,independent of the load. As expected, the main difference between resource usagewas between the single server setup and the multiple server setup. The CPU wasthe limiting factor when the system hit its maximum capacity. Nmon showed thatwhen each of the services ran on its own server, then the CPU usage was at mostaround 85% up until the point the CPU was maxed out, which happened at 250

50

5.4. RESOURCE CONTENTION

RPS for the Docker setup and 300 RPS for the native setup. However, in the case ofusing a single server setup, the CPU usage was always close to 100%. This couldexplain the vast difference between the performance of the two setups. The CPUmaxed out sooner on the single server and became a bottleneck, causing the systemto handle considerably fewer requests. The performance decrease happened at 125RPS for the Docker setup and 150 RPS for the native setup. Finally, there wasa negligible difference in resource usage between the Docker setup and the nativesetup except for the 250 RPS load test, where the Docker setup’s performance drophappened.

5.4 Resource contentionThe default Docker settings allow the containers full access to the host’s resources.When containers on the same host are under a heavy load they may start competingfor resources which causes further decreases in performance. These default settingswere not changed, but in the future it would be possible to limit each container’saccess to certain resources (strategically), but that would have required a fullunderstanding of each container’s needs. Otherwise, the limitation could becomea bottleneck in itself. Simply allocating equal resources to each container mighthave a further negative impact on the container under the heaviest load. Withthat said, as Nmon showed, the CPU was always the main bottleneck on the singleserver setup as the CPU was simply not powerful enough to handle the load above125 requests per second when using Docker and 150 requests per seconds withoutDocker.

5.5 Previous researchPrevious research had shown that Docker containers have a low negative impacton performance when compared to their bare-metal hosts and Docker has beenlabeled as a lightweight virtualization solution. However, it was shown that Dockerdid have a negative impact on network performance. Microservices had also beenshown to perform worse than monolithic applications due to the required networkcalls between services to handle the same workload. It is therefore logical to expectmicroservices to perform worse when they run inside Docker containers as Dockerwill have an impact on the heavy network traffic. Recent studies have indeedsupported those suspicions by comparing the performance of simple services runningwith and without Docker.

This project built upon previous findings with the goal of providing furtherinsight into how Docker performs in an enterprise like setting. A small microservicesystem was built that handled a realistic workflow and the performance of thesystem, with and without using Docker, was evaluated. Previous research hadmostly used simple workload, such as a simple messaging between two services

51


or a benchmarking tool that targets a certain resource such as CPU or networkfor example. Even though Docker had been shown to have a negative impact onperformance of microservices, it was hard to fully understand the impact on aproduction system. The laboratory system for this project had a more complicatedworkload, based on real world examples, which provided more insight into howDocker fares in a production system. To mimic a realistic production system further,various technologies were used such as a database and a message broker and thesystem ran on virtual machines in the cloud, as is a common practice.

52

Chapter 6

Conclusions and future work

The purpose of this thesis project was to measure the potential overhead incurredwhen using container technology, such as Docker, to run microservices in a largescale system. This chapter presents the final conclusions, discusses the limitationsof the work and presents thoughts on potential future work.

6.1 Conclusions

Previous work had shown that microservices are likely to perform worse than themonolithic version of the same service due to the increased number of network callsbetween services. Docker had also been shown to be a very lightweight containersolution with a minimal performance overhead. The observed overhead mainly camefrom network communications and disk usage. However, not much work had beendone to investigate the performance of a containerized microservice system runningon virtual machines despite the fact that this is a widely used method by enterprises.Based on previous findings it was hypothesized that such a system would alwaysperform worse than its containerless version.

A microservice test system, consisting of three services, was built. Theperformance was measured for four different scenarios, all services running on thesame virtual server (with and without Docker) and all services running on its ownvirtual server (with and without Docker). The performance of these alternativeswere compared to see how much of an impact Docker had. Based on the load testresults it was clear that Docker can have a large negative performance impact onsystems with many network calls between services. The Docker system performedworse in all tested scenarios and the performance difference became worse as theload on the system increased. Further investigation into the performance overheadshowed, as expected, that network calls were indeed the main culprits for theslowness of the system, not only calls between the services but also some databasecalls. When the system reached its maximum load capacity the database calls made

53

CHAPTER 6. CONCLUSIONS AND FUTURE WORK

by the user service slowed down while the heavy load did not seem to affect thedatabase calls from the blacklist and geolocation service as much. Based on thetime measurements from the user service, roughly 2/3 of the time was spent onwaiting for a reply from the blacklist service and 1/3 was due to a database call.

It is important to note though that these results do not necessarily show howDocker would work in a large-scale production system. Additionally, one of themain reasons for using Docker is the flexibility it brings, as it allows services tobe quickly deployed and taken down based on external factors, such as load onthe system. During peak load on the system, extra containers can be deployedquickly, given that sufficient hardware resources are available, and the load can bespread amongst the replicated services. However, doing so requires a load balancerfor the microservices, which may introduce further networking overhead based onthe load balancing strategy[41]. Thanks to container orchestrator technology, thedeployment can happen automatically. As the load test results showed, the overheadwas at its minimum during the lower load test.

From an operational standpoint, using Docker can, therefore, provide manybenefits but as usual it is important to understand the requirements of the systembefore making technical decisions and also to fully recognize Docker’s strengthsand weaknesses. If the system needs to be dynamic, Docker is a good option dueto the many container orchestrator solutions available that enable containers tobe started and stopped automatically in a matter of seconds. However, the systemdesigners must be aware of the potential overhead of network calls between services.Identifying services that become bottlenecks and replicating them to share the loadbecomes important. When dealing with a smaller system that is not expectedto handle a heavy load, Docker can also be used to make the development anddeployment process easier.

When scalability is not an option, perhaps due to resource limitations or simplyunneeded, one should be careful before choosing Docker if performance is of theutmost importance. The load test results showed that Docker would negativelyimpact the system’s performance, especially during high load, hence one has tochoose whether performance or an easier operational process is more important.

6.2 Limitations and future workThis section discusses the limitations of this project and suggests some future workthat could prove valuable.

6.2.1 Different hosting environmentsThere are many ways to run a production system and multiple aspects to considerwhen deciding where and how to host your system. For this project a decision was

54

6.2. LIMITATIONS AND FUTURE WORK

made to use AWS EC2 instances and install Docker on virtual machines to runcontainerized services. AWS was chosen due to its large market share, simplicityand my own familiarity with the system. Investigating different cloud providers,such as Google Cloud Platform and Azure, might be interesting in the future. Allthree cloud providers offer container solutions; however, these were not investigatedin this work. Instead of setting up virtual machines and Docker on top of them,the user focuses on the containers. As the literature study had shown, runningcontainers on top of bare-metal machines introduced minimal overhead. Thereforeit would be interesting to continue this work by setting up a bare-metal productionenvironment to host the containerized microservices and measure the impact onperformance. Skipping the cloud and investigating another virtualization platformsuch as VMWare, that could be hosted on-premise.

6.2.2 Container scheduler

The project was carried out in a rather raw fashion. Setting up the services andrunning them manually. This would quickly become cumbersome in a large scalesystem and negate some of the biggest advantages of using containers. Therefore,using a container scheduler is the preferred approach. Since the project was carriedout without this fundamental part of a containerized microservice system it wouldbe interesting to continue the work using a container scheduler to see whetherit has an impact on performance. Kubernetes [42], one of the most popularcontainer schedulers today, uses a concept called pods, for example, to group oneor more containers together as a deployable unit. The containers in the pod sharestorage and networking and are always co-located and co-scheduled. This allowsthe containers to communicate using inter-process communication and they can findeach other using localhost. This might yield a positive impact on performance whentightly coupled services that need to work together are co-located in a pod.

6.2.3 Grouping containers together

Overall comparison of the results of the load tests showed that running the servicesin containers on the same system produced the worst performance. The limitationsof the server the containers ran on was deemed to be the main culprit for thelarge drop in performance. It should be interesting to further investigate the co-location of containers and see how the system would fare on a better server (i.e.,one with sufficient resources). Diving deeper and strategically limiting the resourceseach container has access to might reduce the chance of resource contention, henceshould provide further benefits. Resource cost was not a focus of this project butit proved to be cheaper to run the containers on a single machine when the costfor a given level of throughput is considered. When using a pay-as-you-go paymentmodel at a cloud provider you only pay for the resources used. Therefore, it would beinteresting to see whether scaling vertically or horizontally provides greater benefits.

55

CHAPTER 6. CONCLUSIONS AND FUTURE WORK

6.2.4 Network configuration

It is possible to change Docker’s network configuration. However, it was decidedto use Docker’s private bridge network as it was the default network setting andprovides the most isolation. However, it is also possible to disable the isolationand use the host’s network. Therefore, it would be interesting to run the sameevaluations using that network setting as it should increase the performance of thesystem.

6.3 ReflectionsMicroservices and containerization have become a popular solution to build webservices. As with all new technology that gains popularity fast, a hype cycle tendsto build up and people often feel they are missing out if they do not embracethe new trend. It is an expensive and difficult process to start working with newtechnology and decision to do so should not be made without considering all relevantfactors, such as the performance of the system. The project’s results showed thatmicroservices performed worse when running inside containers. Microservices andcontainers are therefore not a solution that comes without a cost and the needfor them must be examined carefully before starting the process of implementingthem. That being said, containerized microservices do have the potential do reduceoperational costs. They simplify the process of development, testing, and deployingin production which can result in fewer costly mistakes along the way. Systemmanagement becomes easier with more automation, resulting in lesser manpowerrequired to run the systems. The simple and fast deployments of containerizedmicroservices enables companies to meet their customer’s demands more quickly, asfeatures and bug fixes can be rolled at a faster pace. This may provide companieswith a competitive edge in today’s markets.

How to run the services is also an important decision. Deciding to run theservices on dedicates servers, on-premises, requires the correct environment to beset up and managed. Not only does this bring in costs in terms of manpower,but also the cost of running the servers, such as powering and cooling them. Inorder to be able to quickly scale up, sufficient hardware resources will also have tobe available. This means that the on-premises hardware can not be fully utilizedand in the case of maxing out the environment’s capacity, more hardware must bepurchased and set up. This can be avoided by using cloud services. The hardwareis hosted and managed in the cloud provider’s data centers and the operators donot have to worry about the environment on a hardware level. Scaling up and downbecomes a matter of ordering resources from the cloud provider, which happensalmost instantly. The cloud providers also offer pay-as-you-go payment modelsmeaning that the customers only pay for the resources they are using. In the end,this is a matter of owning or renting the underlying hardware used to run the servicesand a thorough cost analysis must be performed in order to make the decision best

56

6.3. REFLECTIONS

suited for one’s needs.

57

References

[1] D. Merkel, “Docker: Lightweight linux containers for consistent developmentand deployment,” Linux J., vol. 2014, no. 239, Mar. 2014. [Online]. Available:http://dl.acm.org/citation.cfm?id=2600239.2600241

[2] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performancecomparison of virtual machines and linux containers,” in 2015 IEEEInternational Symposium on Performance Analysis of Systems and Software(ISPASS), March 2015. doi: 10.1109/ISPASS.2015.7095802 pp. 171–172.

[3] Linux Containers. (2018). Linux containers. Available:https://linuxcontainers.org/, (accessed 07-09-2018).

[4] Red Hat, Inc. (2018). rkt, Available: https://coreos.com/rkt/, (accessed 07-09-2018).

[5] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, andT. Berners-Lee, “Hypertext transfer protocol–http/1.1,” Tech. Rep., 1999.

[6] S. Newman, Building Microservices, 1st ed. O’Reilly Media, Inc., 2015. ISBN1491950358, 9781491950357

[7] J. Thönes, “Microservices,” IEEE Software, vol. 32, no. 1, pp. 116–116, Jan2015. doi: 10.1109/MS.2015.11

[8] B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg,omega, and kubernetes,” ACM Queue, vol. 14, pp. 70–93, 2016. [Online].Available: http://queue.acm.org/detail.cfm?id=2898444

[9] P. Mell, T. Grance et al., “The nist definition of cloud computing,” Nationalinstitute of standards and technology, vol. 53, no. 6, p. 50, 2009.

[10] Gartner, Inc. (2018). Gartner, Available: https://www.gartner.com/, (accessed07-09-2018).

[11] Microsoft. (2018). ASP.NET Core, Available: https://docs.microsoft.com/en-us/aspnet/core/?view=aspnetcore-2.1, (accessed 07-09-2018).

59

REFERENCES

[12] Couchbase. (2018). NoSQL Engagement Database, Available:https://www.couchbase.com/, (accessed 07-09-2018).

[13] Pivotal Software, Inc. (2018). RabbitMQ, Available: https://www.rabbitmq.com/, (accessed 07-09-2018).

[14] OASIS. (2018). AMQP, Available: https://www.amqp.org, (accessed 07-09-2018).

[15] STOMP. (2018). STOMP, Available: http://stomp.github.io/, (accessed 07-09-2018).

[16] A. Banks and R. Gupta, “Mqtt version 3.1.1,” OASIS standard, vol. 29, 2014.

[17] Blazemeter. (2018). Taurus, Available: https://gettaurus.org/, (accessed 07-09-2018).

[18] Apache Software Foundation. (2018). Apache JMeter, Available:https://jmeter.apache.org/, (accessed 07-09-2018).

[19] Gatling Corp. (2018). Gatling, Available: https://gatling.io/, (accessed 07-09-2018).

[20] SeleniumHQ. (2018). What is Selenium?, Available:https://www.seleniumhq.org, (accessed 07-09-2018).

[21] Open Zipkin. (2018). Zipkin, Available: https://zipkin.io/, (accessed 07-09-2018).

[22] B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal,D. Beaver, S. Jaspan, and C. Shanbhag, “Dapper, a large-scale distributedsystems tracing infrastructure,” Google, Inc., Tech. Rep., 2010. [Online].Available: https://research.google.com/archive/papers/dapper-2010-1.pdf

[23] Open Zipkin. (2018). Zipkin4Net, Available: https://github.com/openzipkin/zipkin4net/, (accessed 07-09-2018).

[24] Sourceforge.net. (2017). nmon for Linux, Available: http://nmon.sourceforge.net/pmwiki.php, (accessed 07-09-2018).

[25] E. Casalicchio and V. Perciballi, “Measuring docker performance: What amess!!!” in Proceedings of the 8th ACM/SPEC on International Conference onPerformance Engineering Companion, ser. ICPE ’17 Companion. New York,NY, USA: ACM, 2017. doi: 10.1145/3053600.3053605. ISBN 978-1-4503-4899-7pp. 11–16. [Online]. Available: http://doi.acm.org/10.1145/3053600.3053605

60

REFERENCES

[26] Z. Li, M. Kihl, Q. Lu, and J. A. Andersson, “Performance overheadcomparison between hypervisor and container based virtualization,” in 2017IEEE 31st International Conference on Advanced Information Networking andApplications (AINA), March 2017. doi: 10.1109/AINA.2017.79. ISSN 1550-445X pp. 955–962.

[27] P. Sharma, L. Chaufournier, P. Shenoy, and Y. C. Tay, “Containers andvirtual machines at scale: A comparative study,” in Proceedings of the 17thInternational Middleware Conference, ser. Middleware ’16. New York, NY,USA: ACM, 2016. doi: 10.1145/2988336.2988337. ISBN 978-1-4503-4300-8 pp.1:1–1:13. [Online]. Available: http://doi.acm.org/10.1145/2988336.2988337

[28] J. Shetty, S. Upadhaya, H. Rajarajeshwari, G. Shobha, and J. Chandra,“An empirical performance evaluation of docker container, openstack virtualmachine and bare metal server,” Indonesian Journal of Electrical Engineeringand Computer Science, vol. 7, no. 1, pp. 205–213, 2017.

[29] I. Mavridis and H. Karatza, “Performance and overhead study of containersrunning on top of virtual machines,” in 2017 IEEE 19th Conference on BusinessInformatics (CBI), vol. 02, July 2017. doi: 10.1109/CBI.2017.69 pp. 32–38.

[30] T. Salah, M. J. Zemerly, C. Y. Yeun, M. Al-Qutayri, and Y. Al-Hammadi,“Performance comparison between container-based and vm-based services,” in2017 20th Conference on Innovations in Clouds, Internet and Networks (ICIN),March 2017. doi: 10.1109/ICIN.2017.7899408 pp. 185–190.

[31] H. Rosenberg, “Performance of enterprise web applications indocker containers on vmware vsphere 6.5,” 2017. [Online]. Available:https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/docker-vsphere65-weathervane-perf.pdf

[32] P. D. Francesco, I. Malavolta, and P. Lago, “Research on architectingmicroservices: Trends, focus, and potential for industrial adoption,” in 2017IEEE International Conference on Software Architecture (ICSA), April 2017.doi: 10.1109/ICSA.2017.24 pp. 21–30.

[33] C. Heger, A. van Hoorn, M. Mann, and D. Okanović, “Application performancemanagement: State of the art and challenges for the future,” in Proceedings ofthe 8th ACM/SPEC on International Conference on Performance Engineering.ACM, 2017. doi: 10.1145/3030207.3053674 pp. 429–432.

[34] R. Heinrich, A. van Hoorn, H. Knoche, F. Li, L. E. Lwakatare, C. Pahl,S. Schulte, and J. Wettinger, “Performance engineering for microservices:research challenges and directions,” in Proceedings of the 8th ACM/SPEC onInternational Conference on Performance Engineering Companion. ACM,2017. doi: 10.1145/3053600.3053653 pp. 223–226.

61

REFERENCES

[35] M. Gribaudo, M. Iacono, and D. Manini, “Performance evaluation of massivelydistributed microservices based applications,” in 31st European Conference onModelling and Simulation, ECMS 2017. European Council for Modelling andSimulation, 2017, pp. 598–604.

[36] T. Ueda, T. Nakaike, and M. Ohara, “Workload characterization formicroservices,” in 2016 IEEE International Symposium on WorkloadCharacterization (IISWC), Sept 2016. doi: 10.1109/IISWC.2016.7581269 pp.1–10.

[37] N. Kratzke, “About microservices, containers and their underestimatedimpact on network performance,” CoRR, vol. abs/1710.04049, 2017. [Online].Available: http://arxiv.org/abs/1710.04049

[38] M. Amaral, J. Polo, D. Carrera, I. Mohomed, M. Unuvar, and M. Steinder,“Performance evaluation of microservices architectures using containers,”in 2015 IEEE 14th International Symposium on Network Computing andApplications, Sept 2015. doi: 10.1109/NCA.2015.49 pp. 27–34.

[39] Amazon Web Services, Inc. Amazon Web Services (AWS) - Cloud ComputingServices. (2018). Available: https://aws.amazon.com/, (accessed 07-09-2018).

[40] Chrony. chrony - Introduction. (2017). Available: https://chrony.tuxfamily.org/, (accessed 25-09-2018).

[41] Y. Niu, F. Liu, and Z. Li, “Load balancing across microservices,” in Proceedingsof IEEE INFOCOM 2018, Honolulu, HI, USA, April 15-19, 2018.

[42] The Kubernetes Authors. Kubernetes. (2018). Available: https://kubernetes.io/, (accessed 07-09-2018).

62

TRITA TRITA-EECS-EX-2018:599

www.kth.se

container overhead in microservice...

Documents