institutionen för datavetenskap - diva portal695293/fulltext01.pdf · 2014-02-10 · institutionen...

Institutionen för datavetenskapDepartment of Computer and Information Science

Final thesis

Database analysis and managing large data sets in a trading environment

by

Per Månsson

LIU-IDA/LITH-EX-A--14/009--SE

2014-02-10

Linköpings universitetSE-581 83 Linköping, Sweden

Linköpings universitet581 83 Linköping

Linköping UniversityDepartment of Computer and Information Science

Final Thesis

Database analysis and managing large data sets in a trading environment

by

Per Månsson

LIU-IDA/LITH-EX-A--14/009--SE

2014-02-10

Supervisor: Patrick LambrixExaminer: Fang Wei-Kleiner

“masterthesiseng” — 2014/2/10 — 11:41 — page ii — #4

“masterthesiseng” — 2014/2/10 — 11:41 — page iii — #5

Abstract

Start-up companies today tend to find a need to scale up quickly andsmoothly, to cover quickly increasing demands for the services they cre-ate. It is also always a necessity to save money and finding a cost-efficientsolution which can meet the demands of the company.

This report uses Amazon Web Services for infrastructure. It covers host-ing databases on Elastic Computing Cloud, the Relational Database Serviceas well as Amazon DynamoDB for NoSQL storage are compared, bench-marked and evaluated.

iii

“masterthesiseng” — 2014/2/10 — 11:41 — page iv — #6

“masterthesiseng” — 2014/2/10 — 11:41 — page v — #7

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . 11.2.2 Desires . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 NoSQL and SQL . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Disclaimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.6 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.6.1 Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6.2 Token . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6.3 Payment . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6.4 Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6.5 Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Research 62.1 The Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Amazon Web Services . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Instance Sizes . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 Amazon Relational Database System . . . . . . . . . . 8

2.3 Database Technologies . . . . . . . . . . . . . . . . . . . . . . 92.3.1 Relational Databases . . . . . . . . . . . . . . . . . . . 92.3.2 Graph Databases . . . . . . . . . . . . . . . . . . . . . 92.3.3 Key/Value stores . . . . . . . . . . . . . . . . . . . . . 92.3.4 Document stores . . . . . . . . . . . . . . . . . . . . . 10

2.4 Database Systems . . . . . . . . . . . . . . . . . . . . . . . . 102.4.1 PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . 102.4.2 Cassandra . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.3 accumulo . . . . . . . . . . . . . . . . . . . . . . . . . 132.4.4 Neo4j . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4.5 Riak . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.6 MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.7 Other solutions . . . . . . . . . . . . . . . . . . . . . . 152.4.8 Amazon Relational Database Service . . . . . . . . . . 16

v

“masterthesiseng” — 2014/2/10 — 11:41 — page vi — #8

CONTENTS CONTENTS

2.4.9 Amazon DynamoDB . . . . . . . . . . . . . . . . . . . 162.4.10 Amazon Redshift . . . . . . . . . . . . . . . . . . . . . 172.4.11 Amazon Elasticache . . . . . . . . . . . . . . . . . . . 172.4.12 Amazon SimpleDB . . . . . . . . . . . . . . . . . . . . 17

2.5 Database systems selected for further evaluation . . . . . . . 172.6 Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.6.1 Cassandra . . . . . . . . . . . . . . . . . . . . . . . . . 182.6.2 MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . 192.6.3 PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . 19

2.7 Database usage plan . . . . . . . . . . . . . . . . . . . . . . . 202.7.1 Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.7.2 Token and Payment . . . . . . . . . . . . . . . . . . . 202.7.3 Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.8 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . 212.8.1 Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.8.2 Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Benchmarks 223.1 Benchmark setup . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.1 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3 Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.1 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4 On Cacheability and Latency . . . . . . . . . . . . . . . . . . 253.5 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . 253.6 Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.6.1 Price for performance . . . . . . . . . . . . . . . . . . 283.7 Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Analysis 324.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2 Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2.1 Token and Payment . . . . . . . . . . . . . . . . . . . 334.2.2 Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2.3 Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Summary 365.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

vi

“masterthesiseng” — 2014/2/10 — 11:41 — page 1 — #9

Chapter 1

Introduction

1.1 BackgroundKulipa AB is a Swedish startup company creating a trading solution formobile purchases.

Kulipa is running a trading system on Amazon Web Services. Thedatabases responsible for storing the data they manage are the focus forthis thesis work.

Prior to the report everything is stored in several different PostgreSQLdatabases, and a little in Amazon SimpleDB. Kulipa themselves had set upand were maintaining the Postgres databases.

Concerns regarding scalability of postgres as well as simplicity and main-tainability as Kulipa grows makes them want to evaluate the alternativesfor data storage. These data storage systems should all run on the Amazoncloud, either as a hosted database service or on the computing service.

1.2 PurposeThe goal with this work is to present a solution which ensures future scala-bility, flexibility, availability and maintainability for the involved databases.

Here are specified things that are required for the database system to beacceptable at Kulipa.

1.2.1 Requirements1. It shall be possible to take and restore backups of the system without

downtime.

2. The system shall handle minor failures, for example single server crashes,without downtime.

1


1.3. NOSQL AND SQL CHAPTER 1. INTRODUCTION

3. The system shall be performant and scalable enough to cover the per-formance demands of Kulipa for the foreseeable future.

1.2.2 Desires1. The system is desired to scale down to low price levels, so that Kulipa

can start using the system as soon as possible.

2. The system is desired to scale elastically, so that servers can be addedto the system to help with scaling without downtime.

3. The system is desired to be easily managed with regards to handlingnode failures, backups, etc.

1.3 NoSQL and SQLIn traditional relational databases a primary concern is the ACID property.ACID means Atomicity, Consistency, Isolation and Durability, which in turnmeans that a transaction in a traditional relational database will be atomicand independent of other simultaneous transactions in the database. Thisis a fairly strict requirement.

NoSQL was created from the idea that by sacrificing the ACID propertyone can gain other advantages, mainly in terms of scalability and raw speed[28]. Since the beginnings of NoSQL databases they have evolved mainlyin the direction of the old relational databases, without sacrificing theiradvantages. For example, FoundationDB provide ACID transactions in amulti node cluster [11]. Other databases guarantee isolation and atomicityat lower levels, for example row-level in Cassandra [9].

1.4 Related WorkIn Survey on NoSQL Database [34] Jing Han, Haihong E and Guan Le sur-veys several NoSQL databases, and describe them by evaluating the pros andcons. The databases compared are Redis, Tokyo Cabinet, Flare, Cassandra,Hypertable, MongoDB and CouchDB. The conclusion reached is that toselect your NoSQL database, you need to consider the business model, de-mand for ACID transactions and other requirements. They clarify that eachNoSQL database has its own strengths and weaknesses.

In NoSQL Evaluation: A Use Case Oriented Survey [35] Robin Hechtand Stefan Jablonski compare the data models, query possibilities, concur-rency controls, partitioning and replication opportunities of several NoSQLdatabases. They mainly focus on which type of database technology onewould want, such as key/value stores, document stores or graph databases.Since the NoSQL databases tend to be very different in characteristics, they

2


1.5. DISCLAIMER CHAPTER 1. INTRODUCTION

help decide on which ones are good at what. They go fairly deeply intofeatures such as Map Reduce Support as a query possibility.

In Evaluating Caching and Storage Options on the Amazon Web ServiceCloud [29] David Chiu and Gagan Agrawal evaluate data storage solutionson Amazon Web Services. They mainly compare storing on S3 versus EC2.Some pricing evaluation and network accessibility between the instance typesis also compared. They do cost evaluation on all their solutions as well. Theyconclude that S3 is the most efficient storage option and using m1.xlarge isthe fastest one in their study.

In A comparison between several NoSQL databases with comments andnotes [37] Bogdan George Tudorica and Cristian Bucur compare Cassandra,HBase, Sherpa and MySQL with regards to read latency during varyingworkloads, focusing on percentages of reads and writes. They show thatmany databases have problems scaling past a certain point, except Cassan-dra which seems to decline relatively gracefully. They conclude that NoSQLdatabases behave differently in similar situations and cannot be used inter-changeably.

In Scalable SQL and NoSQL Data Stores [28] Rick Cattell examines anumber of SQL and NoSQL storage solutions which are scalable over manyservers. It serves as a good introduction to several of the databases avail-able. The databases introduced are among others Riak, Redis, SimpleDB,MongoDB, HBase and Cassandra. It provides use cases for different kindsof NoSQL databases. It remarks on the variable nature of the data storagelandscape and the moving target this presents.

In Eventual Consistency: How soon is eventual? An evaluation of Ama-zon S3’s consistency behavior [27] David Bermbach and Stefan Tai create amethod to evaluate the consistency behavior of data storage systems and ap-ply it to Amazon S3, especially with regards to at which rate data becomesstale.

1.5 DisclaimerThe database landscape changes very quickly. NoSQL as a concept is stillfairly young and the databases are adding new features very quickly. Forexample, Cassandra updated from version 1.3 to 2.0 during the writing ofthis project, and they added a lot of features during the upgrade, invalidatingclaims which were true before the upgrade.

1.6 Data setsFive data sets are the focus for this report, named Core, Token, Payment,Log and Search. There is also a future interest in creating a system foranalysis of accumulated data for reasons of market research and such things.

3


1.6. DATA SETS CHAPTER 1. INTRODUCTION

1.6.1 CoreCore contains information about users, stores, customers, organizations andseveral other subparts of the system. These are highly relational, whereseveral stores belong to the same organization, cashiers work in stores, andso on. This database also stores the orders placed in the system. Orderstrack which user bought what, when, and where. A simplified overview ofcore can be seen in Figure 1.1.

1.6.2 TokenThe token database stores authentication tokens for logging in with devices.The tokens are relatively relationless and are only used to record and verifylogged-in clients.

1.6.3 PaymentKulipa does not handle money directly, but uses third party providers forthis. These payment services use credential systems for payments. Thesecredentials are stored in this database. These, like the tokens, are relation-less.

1.6.4 LogThe log database is responsible for storing the logs generated by all otherservices in the system. This could be large amounts of data. The logs needto be searchable, and logs which occurred within 10 days should be quicklyaccessible.

1.6.5 SearchThe search database is used to search for nearby stores, products in stores,and so on.

4


1.6. DATA SETS CHAPTER 1. INTRODUCTION

Figure 1.1: A simplified version of the Core database.

5


Chapter 2

Research

In this chapter I write about Amazon Web Services, Database technologiesand available database systems. I make a plan for which types of databasesare interesting for which data sets. I write about how I created prototypesfor the interesting database systems. How test data is generated is alsohandled.

2.1 The CloudThe Cloud is a term which in practice means a service where you can provi-sion (rent) computing power or some other resource. The most basic formof Cloud computing is the Infrastructure as a Service (IaaS) where you pro-vision things like storage space and virtually hosted computers, on whichyou build your system. There can also be higher levels, Platform as a Ser-vice (PaaS) or Software as a service (SaaS), where you provision things likedatabases or even games. The Cloud is an alternative to purchasing physicalcomputers and storing them in a data center.

The five essential characteristics of cloud computing are, as defined inThe NIST Definition of Cloud Computing [36], as follows:

Self service. A customer can provision the resources they need, when theyneed them, without speaking to an employee of the provisioner.

Broad network access. Capabilities are available over the network throughstandard mechanisms such as a simple website.

Resource Pooling. The provider uses a multi-tenant model where thehardware is shared through virtualization. The process is transpar-ent to the end user who has no control over locality of resources.

Rapid elasticity. Resources can be provisioned and released rapidly andelastically, sometimes even automatically.

6


2.2. AMAZON WEB SERVICES CHAPTER 2. RESEARCH

Measured service. The utilization of resources can be monitored and re-ported, providing transparency to both the end user and the provi-sioner.

Kulipa uses Amazon Web Services, which is a Cloud provider. The mainuse is the IaaS, where they provision virtual computation instances andother things such as load balancers.

2.2 Amazon Web ServicesAmazon Web Services (AWS) is a cloud computing platform consisting ofmany remote computing services.

AWS is a global service split into regions. Kulipa AB has all its servicesin the EU-West-1 region, which is the only European region, and located inIreland. There are also four regions in North America, one in Brazil, two inAsia and one in Australia.

The EU-West-1 region consists of three Availability Zones (AZ). Theseare physical server centers located about one millisecond of network latencyapart. They are separated to provide availability even if one of the AZs wereto fail.

The computing service in AWS is called Amazon Elastic Cloud Comput-ing (EC2). In this service you provision servers, which are virtual servers runin the cloud. There are many server types, from the cheap micro instanceup to high end servers with dedicated hardware.

One can provision hard drives on a Storage Area Network and connectthem to EC2 instances. This is called Elastic Block Storage (EBS). This isuseful for storing data which needs to persist, since most of the storage spaceon the virtual servers is in ephemeral storage, meaning that if the instanceis shut off for any reason, the data is removed.

There is also a storage service called Simple Storage Service (S3). Thisservice allows you to upload data and pay elastically for what you use. Ithas additional features such as static website hosting.

2.2.1 Instance SizesThe instances provisionable vary from the very low computing power of them1.micro instance up to very high performance, highly specialized instancessuch as the c3.8xlarge compute optimized instance, and the cg1.4xlarge withtwo NVIDIA Tesla graphics cards for GPU computing.

The instance sizes used in this work are the ones listed in Table 2.1.Throughout this report the naming conventions m1.micro vs just micro areused interchangeably, since all the instance sizes are from the m1 series.

The m1.micro instance type has 2 EC2 Compute Units, which is fasterthan the m1.small type. However, if using the processor for more than

7


2.2. AMAZON WEB SERVICES CHAPTER 2. RESEARCH

Instance Type Memory EC2 Compute Units Instance Storagem1.micro 613 MiB 2 Nonem1.small 1.7 GiB 1 160 GiBm1.medium 3.75 GiB 2 410 GiBm1.large 7.5 GiB 4 850 GiBm1.xlarge 15 GiB 8 1690 GiB

Table 2.1: AWS instance performance comparison table

short bursts of computation it gets severely throttled to something verylow, approximately 10% but not exactly specified [16].

In addition to the higher memory and computing power in the higherinstances, they also are advertised to get generally better performance inaccessing EBS, network and such. Instances m1.large and above have theoption for EBS optimized available. The m1.micro instance type has Lownetwork performance, and the m1.xlarge has High. Exactly what this meansis not specified.

Instance Type Hourly price Price per 30 daysm1.micro $0.020 $14.4m1.small $0.065 $46.8m1.medium $0.130 $93.6m1.large $0.260 $187.2m1.xlarge $0.520 $374.4

Table 2.2: AWS instance price comparison table

The prices for the instances at the time of writing can be seen in Ta-ble 2.2. The micro instance is comparatively very cheap, as it works dif-ferently to the other sizes. The small instance is 3.25 times the price ofthe micro instance. For every increase in size after that, the instance pricedoubles.

2.2.2 Amazon Relational Database SystemThe Amazon Relational Database System (RDS) allows one to host a databaseas a service. This means one specify the instance size and, on the largertypes, the IO requirements. This is essentially equivalent to hosting yourown database on EC2, only that the setup, the backups and failovers in caseof failure is handled for you. It should be noted that the prices for the hostedservices are slightly higher than the corresponding setup of EC2 instances.This is interpreted as maintenance overhead, and is roughly 10% of the costof the hosted RDS service.

8


2.3. DATABASE TECHNOLOGIES CHAPTER 2. RESEARCH

2.3 Database TechnologiesDatabases differ in what technologies they use for implementation. Manydatabases start out in a single category but ends up overlapping other cat-egories as the database matures. The result is that especially among theNoSQL databases there is a lot of overlap. For example, Riak is a key/valuestore with extra features enabling it to act as a graph database in somesituations.

2.3.1 Relational DatabasesRelational databases are the classic answer to data storage, and is used byall Kulipas systems today. Emphasis is always on consistency, with ACIDtransactions being the norm.

In a relational database the data is organized into tables. The tables arestrongly typed so every row in a table contain the same fields. The set ofspecifications of all tables are called the schema. Data is commonly accessedthrough a Primary Key, which is a set of fields which uniquely identifies arow. It is also possible look up rows based on almost any other field orconstraint. Interconnections between tables are handled by Foreign Keys,where a field in one table refers to a key in another, most commonly thePrimary Key.

Examples of relational databases are PostgreSQL, MySQL, MSSQL, Or-acleDB and many more.

This type of technology should be very suitable for the Core database,since it is relational with many interconnections.

2.3.2 Graph DatabasesGraph databases store data as a graph, with nodes and relations therebe-tween. This gives access times which scales with locality, rather than totalsize of database. Graph databases are part of the NoSQL movement andare schemaless.

A node can contain data, often organized in a key/value structure with-out a predefined structure. Connections between nodes are created dynam-ically and contain data the same way as nodes.

Examples of graph databases are Neo4j and Titan.Graph databases could possibly be used for Core, due to the relational

nature of the stored data.

2.3.3 Key/Value storesThese store values which you look up using a primary key. They are basedon the idea that if you can get really fast access to reading and writing onprimary key you can construct whatever features on top of that. A key/valuestore is approximately the same as a distributed hash table.

9


2.4. DATABASE SYSTEMS CHAPTER 2. RESEARCH

When you look up an entry in a key/value store you get the whole rowback, of a binary blob of data. Some key/value stores understand that arow can have columns and access only part of these rows, or use them inqueries and indexes.

Examples of key/value stores are Cassandra and DynamoDB.Key/Value stores are interesting mainly for the Token and Payment

databases, but also for Log.

2.3.4 Document storesDocument stores store objects in the form of JSON or similar. These objectcan be nested and queried on any part of the objects. The database isschemaless, and not all objects need to contain the same fields. They oftenhave a primary key for unique access similarly to a key/value store, butputs more emphasis on the ability to store arbitrary objects and query onindividual parts of them.

Examples of document stores are MongoDB and Couchbase.The ability to query for secondary attributes make this technology in-

teresting for the Log database.

2.4 Database SystemsIn this section I detail the various database solutions available. I go throughtheir options for reliability and backup and how the machines are set up. Ialso write about Amazon’s solutions for databases, RDS and DynamoDB.Characteristics such as consistency versus availability and how scaling worksis also handled.

The databases I decided to focus more on have more details in thischapter.

The ones I focus less on have one flaw or another which I discoveredwhen reading about them. Things such as inability to do backups or simplyhaving a data model which does not easily fit with any of the data setsinvolved disqualified the database from further research and testing.

2.4.1 PostgreSQLPostgreSQL is the database system used for most database needs at Kulipatoday. It is a fully relational ACID compliant database with excellent sup-port for transactions, replication and backups [20].

2.4.1.1 Replication

Replication is handled via Write-Ahead Log (WAL) shipping. This variantof replication means shipping the WALs from the master server to one ormore slave servers, which are therefore kept up-to-date. The slaves can then

10



serve read requests, but not write requests. They can lag behind the masterserver due to delays in the WAL shipping.

2.4.1.2 Backup

PostgreSQL has several solutions for backing up the database. The mostsimple solution is to use pg_dump, a tool which dumps the database toSQL form. This makes restoring a backup simply feeding the dump to thePostgreSQL driver, psql.

A more complex, but also more flexible solution is on-line backup withpoint in time-recovery. This requires more setup, and involves setting upWAL archiving. Since the WALs are already used in replication, this isprobably the best solution for Kulipa.

2.4.1.3 Setup

To set up a PostgreSQL system which can handle server failures, you needat least two servers, deployed in different AZs. This is the level of replicationyou get if you use the hosted Amazon RDS. For even more redundancy andread capacity, a third server could be used in a third AZ.

It is also possible to set up a read replica, which works the same asa replica used for availability purposes but is instead used for read-onlyqueries. One cannot, however, use a replica both for availability purposesand as a read replica, since if you use a replica for reading and the primaryserver goes down, the replica would need to act as both the read replica andthe primary, which it cannot by definition handle.

2.4.1.4 Characteristics

Since PostgreSQL is a fully ACID database it prioritizes consistency aboveavailability. In the case of a network partition, the master server would con-tinue operating as usual, but the slave servers would be rendered unusable.

Scaling PostgreSQL is mostly about using more powerful and more ex-pensive servers. Putting a caching layer in front of a Postgres server is almostalways a good thing, for example memcached [15] either hosted on EC2 orAmazon ElastiCache [3]. If this does not work the database content wouldhave to be sharded into smaller databases, which would be problematic dueto the relational nature of Core.

2.4.2 CassandraCassandra is a Key/Value store. In a Cassandra cluster, each instance iscalled a node. All nodes in a Cassandra cluster are equivalent. There are nomaster nodes or slave nodes [4]. This makes it fairly simple to add new nodesto a cluster, as they simply acquire the appropriate amount of space and theservers collaborate to make sure the data is synchronized and replicated.

11



When storing an item in a Cassandra database, it is given a token. Whenthis project started, each server was given a starting token, meaning theywill store each token from a specific one up to the next server in the ring.This changed to a virtual node system, where the servers divide up a setof virtual tokens, which effectively divides up the token space between theservers.

It is recommended to store Cassandra data on the ephemeral storage,rather than EBS, due to the lower latency and higher speed [9]. This is incontrast to a most other databases where you can ressurect a node afterstopping it. With Cassandra, to start a node you usually need to make sureit is empty anyway.

Cassandra is very established in Amazon EC2. It is used on EC2 by bigactors such as Netflix [31]. Netflix has made a couple of tools for managingCassandra clusters on AWS, as follows

Priam. Priam is a co-process for Cassandra and is run on every node withCassandra. It provides backup/recovery, token management, configu-ration management and a HTTP API for managing nodes [21].

Chaos Monkey. Tests the fault tolerance of the system by randomly killingnodes in production. This is to make sure that in case of an accidentalnode crash, the system will behave as expected [6]. This tool is notespecially for Cassandra but it shows a dedication to absolute avail-ability in cases of server failures. The fact that they are using this Ithink speaks well for Cassandra.

2.4.2.1 Backup

Using Priam, Cassandra backups are simple [21]. All data is moved to aspecified S3 bucket. If priam is not used, it is fairly simple to do manually.A snapshot can be created using nodetool snapshot simultaneously on allnodes. This creates a snapshot which can then be stored elsewhere [9]. It isalso possible to enable incremental backups.

2.4.2.2 Setup

Cassandra needs at least 3 servers to be able to handle a node failure. In thisscenario all data needs to be replicated on all three nodes. Cassandra canalso be used with more servers, and is indeed designed to scale horizontally,meaning that adding more servers should increase the performance of thedatabase a proportional amount. Netflix scaled a Cassandra cluster upto 300 nodes with linear scaling characteristics [30], which can be seen inFigure 2.1.

12



Figure 2.1: Netflix scaling test graph. Note the very linear scalingcharacteristics. Figure by Adrian Cockcroft taken from Netflix Tech Blog [30].

2.4.2.3 Characteristics

Cassandra is tuneable regarding availability vs consistency. For example,if some user data is read, it might not be a problem if the data is a fewminutes out of sync. Order data shared by a customer and a store, however,needs to always be up to date. This is managed on a query level, where youspecify a consistency level of one for ann eventually consistent read, and toget real consistency you ask for a majority approval on the read.

To get consistent reads, so called Quorum operations are used. It meansa majority of nodes agree on a value. This way, even if some nodes are yetto be updated a majority will agree, and therefore be correct.

2.4.3 accumuloApache Accumulo is a sorted, distributed key/value store [1]. It was createdin 2007 by American National Security Agency, and was contributed toApache Foundation in 2011.

Accumulo has cell-level security, meaning that when you connect to thedatabase with a certain role specified, for example admin, you can onlysee values in the database with a visibility specifier for admin. Visibilityis specified as for example admin|user to allow both admin and user, oradmin|(user&store) to allow admins and connections with both the userand store specifiers.

2.4.4 Neo4jNeo4j is a graph database, which means that it stores data as a graph withnodes and relations between nodes [18]. For example, in Core, the users andstores would be nodes, and the relations between them would be the edges

13



in the graph. This graph is queried by traversing these relations, which doesnot require lookups in large tables as in a SQL-database.

Neo4j has support for both full and incremental backup on running clus-ters. There is support for report instances, for running ad-hoc reports inproduction without disturbing the rest of the cluster.

All writes to the system has to go through a master node. This meansthat if writes are too slow, a more powerful machine is needed, adding morenodes do not help. Additional slave nodes help with read-only queries.

If the master server crashes or otherwise becomes unavailable, the re-maining slave servers will perform an election to automatically determinethe new master.

2.4.4.1 Setup

A high availability cluster in Neo4j requires at least 3 servers.

2.4.5 RiakRiak is a key/value store. In addition to primary key lookups, there issecondary indexes and link walking [23].

Link walking lets Riak act like a graph database. Links are createdbetween objects and, for example, querying friends of friends is possible.

Riak is eventually consistent, meaning that not all clients will see thesame values from the database.

Conflicting writes are resolved using vector clocks, which is a tool todetermine in which order nodes have seen writes. Some conflicts are auto-matically resolvable, otherwise the application layer becomes responsible forchoosing how to resolve conflicts.

2.4.6 MongoDBMongoDB is a key/value store. Documents are stored in a binary JSONform called BSON [17].

Data replication in MongoDB is handled with replica sets. These areclusters of servers that replicate amongst one another and ensure automatedfailover. One server is the primary, all others are secondary. All writes toa replica set is directed to the primary and the secondary servers replicatefrom the primary asynchronously.

If a single replica set is not performant enough, a sharded cluster is usedinstead. Every shard contains a subset of a collection. A shard is deployedin a replica set.

2.4.6.1 Setup

The minimal setup for a redundant cluster is a single replica set. A replicaset is two or more MongoDB servers.

14



If sharding is needed, the minimal setup becomes much larger, as follows:

3 x config servers Configuration servers for the cluster

2 x mongos mongos is a routing service for MongoDB shard configura-tions.

shards The shards are replica sets containing the actual data.

2.4.7 Other solutionsThese are solutions I decided not to research deeper after a cursory glance.

2.4.7.1 Couchbase

Couchbase, formerly known as Membase, is a document store. It evolvedfrom memcached, adding disk persistence and replication. [7]

2.4.7.2 VoltDB

VoltDB is an in memory SQL database designed for high performance trans-actions, for example in a trading environment. [26]

2.4.7.3 Kyoto Tycoon

Kyoto Tycoon is a key/value store which uses Kyoto Cabinet as a backendand has access through http. [14]

2.4.7.4 Hypertable

Hypertable is a Google BigTable clone. [13]

2.4.7.5 Hbase

HBase is a Google BigTable clone. [12]

2.4.7.6 CouchDB

CouchDB is a document store with the ability to go offline. [8]

2.4.7.7 Redis

Redis is an in memory database with focus on several data types and queriesover them. There is support for hash tables, lists and more. [22]

2.4.7.8 Scalaris

Scalaris is a key/value store with ACID properties for multi key transactions.[24]

15



2.4.7.9 ElasticSearch

ElasticSearch is a database which focuses on full text search. This is notreally something which is of interest for Kulipa. [10]

2.4.7.10 OrientDB

OrientDB is a graph database. [19]

2.4.7.11 Titan

Titan is a graph database layer which uses another database as a backend.[25]

2.4.8 Amazon Relational Database ServiceAmazon Relational Database Service (RDS) is a service in which one pro-vision a database instance. These are either single instances or deployed intwo separate AZs, to provide redundancy and failover.

Things like backup, replication, failover and host replacement are han-dled automatically by Amazon, which makes using this system simple.

You can choose between MySQL, PostgreSQL, Oracle DB and MicrosoftSql Server for the RDS instances. Of these, MySQL and PostgreSQL areevaluated, as they have free licenses.

2.4.9 Amazon DynamoDBAmazon DynamoDB is a managed key/value store. No servers or hardware isprovisioned. Instead, you provision the number of IO operations per second(iops) you need. You then pay for the provisioned iops and the amount ofdata you store.

Iops is provisioned in terms of read units per second and write units persecond. A unit in this context means an operation on 1kb or less. If yousettle for eventually consistent reads you can do two reads per unit, insteadof one.

It is possible to scale the provisioned iops both upwards and downwardswithout any downtime.

2.4.9.1 Keys

DynamoDB allows you to specify two columns for your primary key, thehash key and the range key. A query will return the results ordered by therange key. Note that a range key is not required, it is allowed to simply usethe hash key as primary.

On their homepage an example forum database layout is provided. Theexample uses Forum Name for hash key and Subject as range key. That

16


2.5. DATABASE SYSTEMS SELECTED FOR FURTHEREVALUATION CHAPTER 2. RESEARCH

way a query can easily ask for all posts in a given forum and order them bysubject name.

As well as the primary key, DynamoDB allows you to query on secondaryindexes. These have to be specified at table creation time. These secondaryindexes are ways to provide secondary range keys to the database. Youwill still need the hash key to query, but you can, for example, order thedatabase by date as well as subject this way.

2.4.9.2 Setup

The Token and Payment databases are good fits for DynamoDB, seeing asthey would only use the hash key on the ID column as primary key.

Log is more problematic. For the last 10 days for a service query youwould want the service column as the hash key and the date column as rangekey. For the connection time query, however, it is natural to use the ownercolumn as hash key and the date key as range key. This disparity means youwould either need to work around them and consolidate the hash/range keysinto something DynamoDB specific, or make several log databases for eachtype of log. Neither of the alternatives seem attractive, so I decided againstusing DynamoDB for Log. It is still evaluated for purposes of Payment andToken, however.

2.4.10 Amazon RedshiftAmazon Redshift is an analytics oriented database designed to work withhuge data sets. This is possibly a good solution for a future analyticsdatabase, as it is designed for data warehousing rather than live databases.

2.4.11 Amazon ElasticacheAmazon Elasticache is a cache service designed to reside between a properdatabase and the user. This can drastically improve performance for aRDBMS, or any other system where lateral scaling is needed.

2.4.12 Amazon SimpleDBAmazon SimpleDB is a key/value store which is primarily designed for easeof use. Does not allow for data sets larger than 10Gb or iops > 25. It ispossible to query on all columns in a SimpleDB database.

2.5 Database systems selected for further eval-uation

The systems I decided to evaluate and benchmark are the following

17


2.6. PROTOTYPES CHAPTER 2. RESEARCH

EC2 hosted PostgreSQL The configuration Kulipa is running today, withadded replication for availability purposes.

RDS hosted PostgreSQL The equivalent system to EC2 hosting, onlyhosted in RDS instead.

RDS hosted MySQL An alternative to using PostgreSQL is MySQL.

Cassandra The NoSQL database all other NoSQL databases compare them-selves against, used by Netflix with great success in EC2.

MongoDB Recommended by people at Amazon, advertised for simplicityof use.

DynamoDB Amazons NoSQL database.

2.6 PrototypesI use a local virtual instance provisioning tool called vagrant to set up vir-tual prototype clusters on my local computer to test how it feels to startand configure instances to make them run properly and communicate be-tween each other. The systems which were selected for evaluation but notdiscussed in this section are not manually managed and does therefore notrequire anything in the way of setting up.

2.6.1 CassandraFor Cassandra I set up a cluster and give each server an initial token so thateach server is given an equal token space. I also give the IP address to thefirst server I set up in the cluster, so that each server can find the othersusing the gossip protocol [5]. An example setup is seen in Figure 2.2.

Figure 2.2: Cassandra example node setup. All nodes receive the same seed tothe initial node. The solid lines are the connections established through the seed,

and the dotted line is a connection established through the gossip protocol.

18


2.6. PROTOTYPES CHAPTER 2. RESEARCH

I create a so called keyspace, a namespace which contains the data tables,for storing the test data, and set the replication factor to 3, to make surethe data is replicated on at least three nodes.

2.6.2 MongoDBI set up a single replica set of MongoDB, not a sharded cluster. I let thefirst node initiate the replica set using rs.initiate() from the mongo con-sole, and once the other servers are started they are added from the masterserver using rs.add(node_ip), also from the mongodb console. An exampleMongoDB setup can be seen in Figure 2.3.

Figure 2.3: MongoDB example setup. The connection between the nodes isexplicitly established using rs.add(’10.0.1.132’) from the primary node.

Seeing as everything is funneled through the master server, two serversis enough for the purposes of this work.

2.6.3 PostgreSQLI create a configuration for both the master server and the slaves, sinceoptions only regarding the master is ignored on the slaves and vice versa.This makes it simple to have a slave take over in case the master fails, sinceit is configured equivalently.

In the configuration file I set activate WAL-logging and ensure that slaveservers will be kept as hot standbys, which means that they are ready toreplace the master at any time. These settings are appropriately mirroredfor the slave servers.

On the slave servers, I create a recovery.conf file, which lets themknow that they are slaves, and what master to connect to.

19


2.7. DATABASE USAGE PLAN CHAPTER 2. RESEARCH

The master server is started first. To create a slave, a backup is takenfrom the master, and recovered on the slave. When the slave subsequentlystarts, it connects to the master server and is brought up to date, and startsthe replication process.

An example PostgreSQL setup can be seen in Figure 2.4.

Figure 2.4: PostgreSQL example setup. The slave server connects to the masterusing the primary_conninfo data in the recovery.conf file.

2.7 Database usage plan

2.7.1 CoreSince Core stores a data set with many interconnections, a relational databaseis an excellent fit. Either the PostgreSQL solution, the hosted RDS systemwith MySQL or one of the Graph databases would be the best solution forthis data.

If the scaling demands grow too large, single relatively independent tablessuch as Order can be sharded out to a database of their own.

2.7.2 Token and PaymentThese are very similar systems, since they both have completely non-relationaldata. These would preferably be stored in a key/value store such as Dy-namoDB or Cassandra.

2.7.3 LogLog is the database that accumulates data the fastest, since all operationsin the system is logged. The requirements for fast log access is fast access toapproximately the 10 million latest log messages. Due to the requirements

20


2.8. DATA GENERATION CHAPTER 2. RESEARCH

for fast access to recent logs it needs to be stored somehow. As the data isalmost completely non-relational, a NoSQL store is reasonable.

Seeing as the fast-access requirement is 10 million log messages of 1kBeach, this means an active data set of 10GB. This is precisely the upperlimit of a data set stored in DynamoDB. If indexes are added, their sizescount towards the limit, meaning that DynamoDB cannot host Log in asingle table if the data reaches the projected size.

Since one of Cassandra’s strengths is fast writes [9], and Log messagesare very write-heavy, Cassandra seems like a good choice.

An alternative solution is to use some log framework such as scribe towrite log messages to log files, and then use more ad-hoc methods for query-ing, such as grep and perl. This solution is not covered in this report,however.

2.8 Data GenerationI decided that since all the non-relational data is quite similar, I only gener-ate the log data and test all the non-relational access patterns towards thisdata. This leaves me with two data sets, Core and Log.

2.8.1 CoreCore is a fairly large system with many interactions. I have extracted whatI believe is the most important part of the system, namely the tables User,Store, Organization, Product, Order, Device and Terminal. A diagram overthis is shown in Figure 1.1. This enables me to test the important relationalqueries.

The core generation creates the relations properly, but much of the datais simply randomly generated text. Some of the things, for example phonenumbers and names, are generated from a data set.

The generation system starts off with a number of users and generatesthe other properties following that. A user has one or more devices, forapproximately every 100 users there is a store, for every 2 stores there is anorganization, and so on. It assumes users make many orders, on average 100orders per user. The total size of the generated data amounted to slightlyabove 5GB.

2.8.2 LogAs with the Core data, the log data is mostly random. However, part of theaccess patterns of Log require specific structure for part of the data, so Iensure to generate this properly. This is further covered in the next chapterin Section 3.3.1.

As the rows of the tables are independent they are simply generated oneat a time. The total size of the generated data amounts to almost 12GB.

21


Chapter 3

Benchmarks

In this chapter I write about the setup used for benchmarking and the testsprocured for benchmarking the databases.

3.1 Benchmark setupFor the benchmark setup I create a Virtual Private Cloud (VPC), withsubnets in all three AZs. For the EC2-ran databases this allows the instancesnot to be open to the internet, which is an attractive security feature.

For querying I set up three EC2 instances, one in each AZ, with all thebenchmark scripts on them. I then use parallel-ssh to run the scripts onall these instances at once and gather the data. This creates a workloadfor the databases which ensures that the queries are not all on the localnetwork. I run all scripts on a parallel level of three and nine, which is oneand three connections at a time, respectively, per instance.

3.2 CoreTo benchmark the core system, the data was generated and put into the testsetups, namely the EC2 Postgres database and the RDS hosted Postgres andMySQL databases.

Since all these systems use SQL, the tests are simply SQL queries whichare timed. There are some differences between the syntax in Postgres andMySQL. For example, in a count(*) query in Postgres, the resulting columnis automatically named count, while in MySQL you have to do count(*) ascount for the same effect. MySQL will happily do sum on a VARCHAR column,while in Postgres a cast statement needs to be inserted. One of the columnsin Kulipa is named order, which conflicts with the order by statement inSQL. In Postgres, this is handled by quoting the name, "order", while inMySQL it is surrounded instead by backticks, `order`. Such differences

22


3.3. LOG CHAPTER 3. BENCHMARKS

may require some queries to be written differently than engineers at Kulipaare used to.

For the Postgres systems, I use the instance sizes from Table 2.1. Forthe AWS hosted RDS, I use the corresponding sizes of database variants.

3.2.1 TestsThese are the test queries ran for each setup. These are contained SQL scriptfor each test. I use different scripts for Postgres and MySQL to handle thedifferences in syntax. They use identical semantics, however, and should notresult in any performance differences.

Get monthly top sellers This query gets the top selling products for allstores in an organization. This is the heaviest analytical test query for Core.It would not be expected to run very often due to the long query time, butcould still provide useful information.

Average transaction value for frequent customers This extracts cus-tomers who are frequent, meaning they are repeat customers at the specifiedstore, and extracts the average transaction values for the purchases of thesecustomers between the dates specified.

Popular products among frequent customers Among the frequentcustomers between the two dates, this query answers which products arethe most popular.

Popular products among non-frequent customers The same idea asabove, only for non-frequent customers instead of frequent ones. The ideais to find out if the products attracting new customers are the same ones asthe products which keep customers coming back.

Sells by cashier This query lists the amount of sales all cashiers make inthe specified time frame.

3.3 LogFor the non relational data I generate the log data and put it into Cassandra,mongodb and Amazon DynamoDB, as well as PostgreSQL.

All the EC2-hosted databases use the instance sizes from Table 2.1. ForDynamoDB I tried to match the price level and run the benchmarks at aprice point which is approximately equivalent to the EC2-instances. This isnot entirely trivial, as you pay elastically for the amount of space you usein DynamoDB, while the instances have fixed storage space. The amountof IOPS I provided to match each instance size is listed in Table 3.1. The

23



price for 10 read operations or 50 write operations is $0.00735 per hour. Iestimated that the amount of write operations should be about 25% of thetotal amount of operations. If the difference between the price listed in thetable and the instance costs in Table 2.2 is calculated as the storage cost, Ibelieve I undershoot the storage costs a bit. This needs to be kept in mindwhen comparing the results.

Instance Type matched Write Capacity Read Capacity IOPS total pricem1.small 50 150 $0.059m1.medium 100 300 $0.118m1.large 200 600 $0.235m1.xlarge 400 1200 $0.470

Table 3.1: DynamoDB provisioned IOPS matching instance sizes

When I moved on from prototypes to the benchmark servers, Cassan-dra had released an updated version making the initial token assignmentirrelevant, so it was removed. Instead, a system of virtual tokens is used,distributing tokens randomly across the cluster. This makes setup of a Cas-sandra cluster significantly simpler.

For Cassandra the minimal set is three instances. I run all the tests inthis configuration for each instance size.

Compared to Cassandra, MongoDB is very comfortable to use. Thepython code for the connection time query and is simply

client.test.log.find({'action': 'connection.lost', 'owner': owner})

which can then be iterated over. The objects return are structured so thatthey can be accessed using simply obj[’meta_data’][’connection_lost’].Compared to Cassandra, where an indexed expression is explicitly con-structed, and the objects are not naturally nested.

3.3.1 TestsThese are the test queries ran for each setup. For each test and database apython script is made. I chose python because appropriate drivers exist forall databases, and Kulipa use a lot of python in their codebase.

A bug in the index usage in Cassandra makes the Get single row by IDquery time out. It also affects the replication ability, so you have to recreateall indexes when replicating a node. This bug is fixed in the latest version ofCassandra, but the one I ran the tests on was affected. This unfortunatelymeans I have no data for that test for Cassandra.

Get all log lines for this service from last 10 days Simple query forgetting interesting logs regarding a service.

24


3.4. ON CACHEABILITY AND LATENCYCHAPTER 3. BENCHMARKS

get a single row by ID The benchmark test for single row access, usefulfor the Token and Payment databases. For this test I get a random set of1000 rows over the database on the same connection.

Get login times This query fetches logs of the type connection.lost, whichcontains an internal field of type connection_established. The two are sub-tracted to find the total time the user was connected. Since this uses aparticular subfield it is taken care to generate properly in the data genera-tion.

3.4 On Cacheability and LatencySince most data tends to be fairly cacheable, the benchmarks are not writtento spread the queries over the entire data set, but rather a small part of it.For example, it is unlikely that every user of the system tries accessing everystore at once, but more likely that a subset of users use a subset of the stores.Later orders are more interesting than old orders, and so on.

The time taken for connecting to the database is included in most bench-marks. This is because I’m looking for the real total time taken for a typicalquery, not the time in the database queries as well. If the benchmarks werecomparing different ways of obtaining the same results, the connection timecould be discarded.

3.5 Benchmark ResultsAfter all the tests are run and the data collected, the data is averaged andplotted in the following charts. The bars represent the average and the errorbars are plus/minus the standard deviation. In some queries the standarddeviation actually extend below 0, this does not mean some queries tooknegative time, but rather that the standard deviation is larger than theaverage. If you looks at the actual data for where this happens, you wouldfind that the actual timings are groups of some very quick timings and somevery slow. This is the contrast between the micro instance running att fullspeed versus where it gets throttled.

Most scripts was run 100 times per query instance, meaning 300 totalfor a paralell level of three and 900 times for nine. The slowest ones wereoccasionally terminated early when a stable result was deemed to have beenachieved.

3.6 CoreAverage transaction value for frequent customers (Figure 3.1) Ascan be seen in Figure 3.1, the micro instances seem to be doing very well on

25


3.6. CORE CHAPTER 3. BENCHMARKS

micro small medium large xlarge0

1

2

3

4

5

6

7

8

9

rdspostgrespostgresrdsmysql

(a) 3 parallel


10

20

30

40

50

60

70


(b) 9 parallel

Figure 3.1: Average transaction value for frequent customers benchmark in seconds

the 3 parallel benchmark, actually faster than the small instances, followedby an expected successive reduction in time taken by the larger instances.On the 9 parallel benchmarks the micro instances become very unstablewith a very large standard deviation.

The EC2-hosted Postgres database seem to be doing very well, but witha strange hiccup in that the large instance runs slower than the mediuminstance.


20

40

60

80

100

120

140


(a) 3 parallel


100

200

300

400

500

600


(b) 9 parallel

Figure 3.2: Most popular by month benchmark in seconds

Most Popular by Month (Figure 3.2) The micro instances are veryslow and have a very high standard deviation in this benchmark, as seen inFigure 3.2. The other instances behave as predicted, approximately doublingperformance for each instance size.

Orders per Cashier (Figure 3.3) Here RDS Postgres seem to under-perform while MySQL seems to perform very well. Note that also here the

26



micro instances are faster than the small instances.

micro small medium large xlarge0.0

0.5

1.0

1.5

2.0

2.5

3.0


(a) 3 parallel


0

5

10

15

20


(b) 9 parallel

Figure 3.3: Orders per cashier benchmark in seconds


2

4

6

8

10

12


(a) 3 parallel


0

5

10

15

20

25

30

35

40


(b) 9 parallel

Figure 3.4: rare buyers buy what benchmark in seconds

Rare Buyers Buy What (Figure 3.4) Here in Figure 3.4 the microlevel of RDS MySQL seems extremely unstable, while otherwise performingvery well. RDS Postgres follows a more predictable curve than the EC2 one.

Store Total Sales (Figure 3.5) In Figure 3.5 we see that RDS Postgresseems very slow and unstable. RDS MySQL also acts unstable for the microinstance. For the others, EC2 Postgres has a hiccup at the large level, butotherwise seems predictable. RDS MySQL completely crushes the otherdatabases for this query.

Top 10 Products for Frequent Customers (Figure 3.6) For thisbenchmark in Figure 3.6 the databases seem to act approximately the same

27




0

2

4

6

8

10

12


(a) 3 parallel


5

10

15

20

25

30

35


(b) 9 parallel

Figure 3.5: store total sales benchmark in seconds


0

5

10

15

20


(a) 3 parallel


10

20

30

40

50

60


(b) 9 parallel

Figure 3.6: top 10 products for frequent customers benchmark in seconds

as the previous benchmarks. RDS MySQL is a lot faster than the alterna-tives here as well.

3.6.1 Price for performanceSince this report aims to optimize the price for the necessary performance,it is interesting to plot the data in relation to the price for the instance sizeused. Two representative benchmark results were chosen and the resultingdata multiplied by the price of the database to get a price per performancestat.

The result of this can be seen in Figure 3.7. Since the instance pricedoubles for every step of increased size, the expected result would be adoubling in performance as well. The result would be a flat characteristicwhich is approximately what is seen in the graph. The large instance of EC2Postgres is clearly performing uncharacteristically well. This is discussedmore in the Closing chapter in the 4.1 Discussion section.

28




5

10

15

20

25

30

35

40

postgresrdsmysqlrdspostgres

(a) The most popular by month bench-mark

micro small medium large xlarge0.0

0.5

1.0

1.5

2.0

2.5

postgresrdsmysqlrdspostgres

(b) The orders per cashier benchmark

Figure 3.7: Representative benchmarks weighed with instance costs. Lower meansa lower price for performance.

A clear trend in this graph is that the RDS MySQL is significantlycheaper per performance in addition to having an absolute performancewhich is better. The notable exception is the micro instances, where EC2hosted Postgres was superior for the most popular by month benchmark andthe RDS Postgres was the better choice for the orders per cashier bench-mark.

3.7 LogWhen inserting data into MongoDB on the instance size small, the insertionbecame unresponsive after about 80% of the insert script, reducing the inser-tion rate to less than 1%. I aborted the insertion and ran some test querieson the database. It turns out that the scripts which ought to take less thana second took more than 45 seconds to run. I abandoned my efforts to runMongoDB on small instances. On every other instance size, MongoDB ranall the benchmarks.

Raw Access by ID (Figure 3.8) Since this is such a simple benchmark,all the databases completed it without trouble. The only measurementsmissing is the MongoDB small instance.

For the 3 parallel benchmark, DynamoDB didn’t improve its perfor-mance when increasing capacities, and Cassandra only increased when go-ing from small to medium, the higher ones are basically the same. Whenincreasing to the 9 parallel benchmark the same effect can be spotted, onlybetween medium and large for Cassandra. DynamoDB also sees an improve-ment when going from small to medium, but then flattens out. These resultindicate that 9 computers running the python drivers cannot saturate thedatabases on higher instance sizes.

29



The differences in time between the databases is interpreted to indicatethe difference in latency for a query. MongoDB seem to have the lowestlatency of the tested databases.

small medium large xlarge0

1

2

3

4

5

6

7

dynamodbmongodbpostgrescassandra

(a) 3 parallel


2

4

6

8

10

12

14

16


(b) 9 parallel

Figure 3.8: Raw Access by ID in seconds

Get last 10 days for a service (Figure 3.9) Only MongoDB and Post-gres was able to run this benchmark. Upgrading to a larger instance doesnot seem to improve speeds much on this benchmark either, which suggeststhat it is fast enough to be dominated by latency rather than databaseperformance.

small medium large xlarge0.05

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35


(a) 3 parallel

small medium large xlarge

0.0

0.2

0.4

0.6

0.8dynamodbmongodbpostgrescassandra

(b) 9 parallel

Figure 3.9: Get last 10 days for a service benchmark in seconds

Get login times (Figure 3.10) In this benchmark MongoDB performscompletely awfully. It’s so bad it makes me think it’s some issue with anindex that has broken or equivalently, because a difference in access timesthis high is not seen anywhere else.

30



For Cassandra and postgres, it seems to get capped on latency ratherthan database performance.


0

5

10

15

20

25

30


(a) 3 parallel


0

10

20

30

40

50


(b) 9 parallel

Figure 3.10: Get login times for user in seconds

31


Chapter 4

Analysis

In which the benchmark results are discussed and a recommendation is pre-sented.

4.1 DiscussionFor the most part, the databases behaved as was expected, at least as faras the small up to xlarge went.

Generally the micro instances were surprisingly fast, for example in Fig-ure 3.1 they are faster than the small instances. They clearly dropped offwhen the level of parallelism increased, and were very less predictable thanthe normal instances. This is probably because of the advertised throttlingof CPU performance when the workload increased. This is also visible inthe Most popular by month benchmark (Figure 3.2), where they seem to getthrottled even for the lower level of parallelism.

The large version of the EC2 Postgres seem to act strangely on manybenchmarks, for example the Store Total Sales-benchmark in Figure 3.5, itseems slower than the medium instance. It’s possible that there was somenetwork connectivity issues, since the effect is not apparent on the analyticalmost popular by month benchmark. This is probably not a general issue withthe large instances, but instead a problem with the specific one I ran thebenchmark on. To be really sure one could rerun the benchmarks on a newlarge instance.

Hosting PostgreSQL on EC2 versus using the Amazon hosted RDS ver-sion seems to be fairly equivalent, performance wise. With the exception ofthe large instances, the EC2 version tends to be slightly faster. If a Post-greSQL system were to be chosen, the EC2 version is thus both slightlycheaper and slightly faster. This would be weighed against the simplicity ofRDS, where the setup and maintenance is much easier.

In most benchmarks the RDS MySQL was significantly faster than eitherof the Postgres solutions. For example in the Rare Buyers buy What bench-

32


4.2. RECOMMENDATION CHAPTER 4. ANALYSIS

mark in Figure 3.4, MySQL completes the test in about 70% to 50% of thetime the RDS Postgres solution. This weighs heavily in favor of MySQL.The RDS MySQL instances are also cheaper than the RDS PostgreSQLones. Both, however, are more expensive than the manual EC2 hosting.

Looking at the log tests, the difference between the Postgres and theNoSQL databases was lower than I expected. Otherwise, I think the NoSQLdatabases mostly lived up to their hype, with a general increase in perfor-mance for a specific set of tasks. Unfortunately most of the tests didn’tstress the databases quite as much as I had hoped.

With the results from Log, one can conclude that there is no signifi-cant difference in latency between the different instance types. In the RawAccess by ID benchmark (Figure 3.8) there was no apparent difference atall between instance sizes medium to xlarge. Since the xlarge instance cer-tainly has more computational power, the similarity in performance mustbe attributed to network latency. The differences in network performancecharacteristics described between the instances might be throughput only,or perhaps not as significant as one might think.

4.2 RecommendationAll the solutions tested live up to the requirements presented in the intro-duction chapter. Most of the desires are fulfilled as well. In conclusion, allsystems are acceptable. This recommendation presents what I believe to bethe most desired solution.

The desired solution should optimize these criteria

• Simplicity of transferral from the old system,

• Scalability for the foreseeable future, and

• Performance per dollar.

Each of the Kulipa data sets are examined.

4.2.1 Token and PaymentFor storing any type of non-relational data and accessing them by row ID,any of the databases examined can certainly fulfill the requirements. Judgingprimarily by the Raw Access-benchmark, all the databases ran into latencyissues on my setup.

If a database would be chosen for lowest latency, MongoDB would bea good choice. It is also really nice to interact with from python, and theother languages used at Kulipa.

If simplicity of setup is deemed more important, DynamoDB is a superiorchoice. It’s not quite as nice to interact with but very usable. It is alsosignificantly simpler to scale either upwards or downwards. You can even

33



do things like scale down over night, or significantly scale up during populartimes, both of which are very bothersome to do in MongoDB.

If a 5ms latency is acceptable, I would recommend DynamoDB for bothToken and Payment. Otherwise MongoDB seems an excellent choice.

4.2.2 LogFor the log database Cassandra unfortunately wasn’t available for one of thetests and MongoDB performed awfully at the other. Still, if the assumptionthat the MongoDB version is fixable to get equivalent performance as thefirst test is made, and both tests are extrapolated assuming the databaseswould have approximately equal performance compared to Postgres on both,some conclusions can be drawn.

Both NoSQL databases are faster than Postgres in this instance.Since Cassandra is optimized for writing it seems like a good choice for

a log database. Seeing as it is used heavily on EC2 by netflix, I think it isthe best choice.

A problem with the Cassandra solution is that it should use ephemeralstorage. This means it has to use at least three instances of the smalltype. This is larger than what is used at Kulipa today. Another problem isthe indexing bug discussed in 3.3.1. To overcome the bug one could eitheruse the current version from the developers of Cassandra or wait until theUbuntu repositories pull in a fixed version. A good compromise may be toeither try to set up Cassandra on micro instances for the time being andignoring the bug, or to continue to use the current Postgres solution untilthe system becomes large enough that it would utilize three small instancesand the bug is fixed.

One thing not done in this report is testing for write performance inaddition to read performance. This is mostly relevant for the log database,where a large percentage of the load is assumably writes. However, this isfairly strongly covered in A comparison between several NoSQL databaseswith comments and notes [37] where they find that cassandra does not havea problem with scaling up writes.

4.2.3 CoreWhen examining the Core benchmarks, generally MySQL is the fastest ofthe solutions. It is the solution which performs best, and best per dollar aswell. However, MySQL has fewer protections against programmer mistakes,some examples of which can be seen in [32], and a history of security flaws,such as allowing a user to connect with any password if attempted enoughtimes [33]. These flaws are not insurmountable by any means, since Kulipauses an ORM -layer to abstract away most details about the database, andit shouldn’t be possible to connect to the database from outside of the VPCanyway. If these flaws are considered acceptable, MySQL is a good choice.

34



If the increased robustness of Postgres is deemed necessary, the choicebetween RDS Postgres and EC2 Postgres remains. The performance seem tobe approximately equivalent, without a clear advantage to either. The pricefor RDS is approximately 30% higher than EC2. This is weighed againstthe complexity of maintaining a EC2 hosted cluster.

An advantage of EC2 hosting is flexibility in backups, where there canbe problems with RDS backups if you want to move out of RDS for example[2]. This counts against both Postgres and MySQL when hosted in the RDS.

All in all, I think EC2 Postgres seems the way to go. It is the cheapestsolution, as well as the most flexible. It unfortunately needs the most manualmaintenance. It is also what Kulipa is running today, which make themigration costs very low, and does not require engineers at Kulipa to learnnew quirks of another database.

The only thing that would be necessary to transition would be to add areplica server and start the replication stream to it.

35


Chapter 5

Summary

In this report a number of database solutions were considered for a numberof data sets. The data sets considered were the highly relational Core, thepurely non-relational Token and Payment databases and the semi-relationalwrite-heavy Log database.

The data sets were grouped based on however they were sets of relationaldata, which left us with two data sets on which to run benchmarks. Itwas decided to test the relational data in relational databases and the non-relational data in both relational and non-relational NoSQL databases.

After researching several database solutions a few were decided to bethe most interesting ones. For the all data sets the current system in placetoday at Kulipa are stored in relational Postgres databases. For the rela-tional data, it was decided to compare the current system of EC2-hostedPostgres databases versus Amazon RDS-hosted Postgres as well as MySQLdatabases. For the non-relational data the current Postgres solution wascompared against Cassandra, MongoDB and Amazon DynamoDB.

For the benchmarks both relational and non-relational data sets wererandomly generated and inserted into the corresponding database solutions.A set of benchmarks grounded in business logic from Kulipa were produced.These benchmarks were then run over all database systems with the gener-ated data.

After the database solutions were tested the results were analyzed and arecommendation for a scalable future solution for databases was presented.The recommendation took into consideration several factors including sim-plicity of setup and maintenance as well as performance and price. The rec-ommended systems were to use the current solution, EC2-hosted Postgres,for the relational data, to use DynamoDB for the purely non-relational datasets Token and Auth, and finally to use Cassandra for the semi-relationalwrite-heavy Log database.

36


5.1. FUTURE WORK CHAPTER 5. SUMMARY

5.1 Future workSince the field of NoSQL databases is so young there is always opportunitiesto run database benchmarks with newer versions and also new databases.Several databases were decided against for various reasons. There were manydatabase systems which were decided against for reasons of maturity, wherethere was no real system in place for backups, for example. When thesecome to maturity they can be tested properly.

The results from this report is probably fairly specific to companies usingAmazon Web Services for hosting. The database benchmarks used werechosen from business requirements for Kulipa. If someone is interested indoing equivalent benchmarks, a set of tests should be constructed from therequirements on the database from the point of view of the interested party.

Another thing to note is that the prices for almost everything on AWSbeen lowered several times over the last few years. Naturally, if the pricesfor hosting are lowered, so are the prices for the databases.

Unfortunately I did not really touch on the search database in this report.It seemed very different from the other databases so I left it alone and thenI ran out of time.

The future concern for an analysis database was also left fairly un-touched. This is mainly because Kulipa is small enough that large scaleanalysis of data seems very far away, and can probably be covered by simplyrunning analysis jobs on the Core database at low-traffic times, for exampleat night, for a long time. When this becomes too heavy a task for Coreto handle, this point can be revisited and something like Amazon Redshiftcould be analyzed more in depth.

37


Bibliography

[1] Accumulo. http://accumulo.apache.org/, Mar. 2013.

[2] Amazon discussion forums about downloading backups. https://forums.aws.amazon.com/message.jspa?messageID=239119#jive-message-242280, Dec. 2013.

[3] Amazon elasticache. http://aws.amazon.com/elasticache/, Dec.2013.

[4] Cassandra. http://cassandra.apache.org/, Mar. 2013.

[5] Cassandra gossiper. http://wiki.apache.org/cassandra/ArchitectureGossip, Dec. 2013.

[6] Chaos monkey wiki on github. https://github.com/Netflix/SimianArmy/wiki/Chaos-Home, Dec. 2013.

[7] Couchbase. http://www.couchbase.org/membase, Mar. 2013.

[8] Couchdb. http://couchdb.apache.org/, Mar. 2013.

[9] Datastax architecture planning for ec2. http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html, Dec. 2013.

[10] Elasticsearch. http://www.elasticsearch.org/, Mar. 2013.

[11] Foundation db. http://foundationdb.com, Mar. 2013.

[12] Hbase. http://hbase.apache.org/, Mar. 2013.

[13] Hypertable. http://hypertable.org/, Mar. 2013.

[14] Kyoto tycoon. http://fallabs.com/kyototycoon/, Mar. 2013.

[15] Memcached. http://memcached.org/about, Dec. 2013.

[16] Micro instances throttling explanation. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts_micro_instances.html,Dec. 2013.

38


BIBLIOGRAPHY BIBLIOGRAPHY

[17] Mongodb. http://www.mongodb.org/, Mar. 2013.

[18] Neo4j. http://neo4j.org/, Mar. 2013.

[19] Orientdb. http://www.orientdb.org, Dec. 2013.

[20] Postgresql website. http://www.postgresql.org/, Dec. 2013.

[21] Priam wiki on github. https://github.com/Netflix/Priam/wiki,Dec. 2013.

[22] Redis. http://redis.io/, Mar. 2013.

[23] Riak. http://basho.com/riak/, Mar. 2013.

[24] Scalaris. https://code.google.com/p/scalaris/, Mar. 2013.

[25] Titan. http://thinkaurelius.github.io/titan/, Dec. 2013.

[26] Voltdb. http://voltdb.com/, Mar. 2013.

[27] D. Bermbach and S. Tai. Eventual consistency: How soon is even-tual? an evaluation of amazon s3’s consistency behavior. In Proceedingsof the 6th Workshop on Middleware for Service Oriented Computing,MW4SOC ’11, pages 1:1–1:6, New York, NY, USA, 2011. ACM.

[28] R. Cattell. Scalable sql and nosql data stores. SIGMOD Rec., 39(4):12–27, May 2011.

[29] D. Chiu and G. Agrawal. Evaluating caching and storage options onthe amazon web services cloud. In Grid Computing (GRID), 2010 11thIEEE/ACM International Conference on, pages 17–24, Oct.

[30] A. Cockcroft. Netflix blog, benchmarking scalabil-ity on aws. http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html, Dec. 2013.

[31] A. N. Cockcroft. C* 2012: Cassandra performance and scalability onaws. https://www.youtube.com/watch?v=Wo-zkUH1R8A, Aug. 2012.

[32] R. N. C. Conery. Five things you didn’t know about postgressql. http://vimeo.com/43536445, June 2012.

[33] S. Gallagher. Security flaw in mysql, mariadb allows ac-cess with any password—just keep submitting it. http://arstechnica.com/information-technology/2012/06/security-flaw-in-mysql-mariadb-allows-access-with-any-password-just-keep-submitting-it/,June 2012.

[34] J. Han, E. Haihong, G. Le, and J. Du. Survey on nosql database. In Per-vasive Computing and Applications (ICPCA), 2011 6th InternationalConference on, pages 363–366, Oct.

39


BIBLIOGRAPHY BIBLIOGRAPHY

[35] R. Hecht and S. Jablonski. Nosql evaluation: A use case oriented survey.In Cloud and Service Computing (CSC), 2011 International Conferenceon, pages 336–341, Dec.

[36] P. Mell and T. Grance. The nist definition of cloud com-puting. http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf, Sept. 2011.

[37] B. Tudorica and C. Bucur. A comparison between several nosqldatabases with comments and notes. In Roedunet International Con-ference (RoEduNet), 2011 10th, pages 1–5, 2011.

40

På svenska

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/

In English

The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances.

The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/

© Per Månsson

http://www.ep.liu.se/