birkbeck, university of london - · pdf filebirkbeck, university of london school of computer...
Post on 04-Mar-2018
214 Views
Preview:
TRANSCRIPT
Birkbeck University of London
School of Computer Science and Information Systems
MSc Computer Science
Project Report
Document-OrientedPersistence with CouchDB
Supervisor Nigel Martin
Author Michael Lenahan
This project report is substantially the result of my own work expressed inmy own words except where explicitly indicated in the text I give my
permission for it to be submitted to the JISC Plagarism Detection Service
September 2010
Abstract
This project investigates an innovative database technology CouchDB byway of assessing its use in applications derived from the British CouncilActivity Mapping project The projectrsquos aim is to examine the advantagesand disadvantages of CouchDB from the perspective of a web applicationsdeveloper
CouchDB combines a web server with a data storage mechanism Datais stored in the form of denormalised documents and queried through map-reduce functions which result in the creation of indexed views
The project considers the suitability of CouchDB as a data store andweb development platform in support of an existing relational database ap-plication with an assessment of the strengths of both approaches
Alles im richtigen Maszlig
Acknowledgments
With thanks to my supervisor Nigel Martin for timely patient and good-humoured advice - you really helped me to stay on track
Thanks also to my colleagues and ex-colleagues at the British Councilin particular Terry Pyle Kshipra Singhvi Phil Street Spero BlassoplesMichael Sadler Roger Moran and Masoud Ahanchian
Thank you Su Peneycad for your encouragement and wonderful skill atspotting typos
I had the great fortune to be doing this project during the summer of2010 when the (first) lsquoNoSql Summerrsquo was in full swing Thanks to theorganisers of the London meetups Neil Robbins and Makoto Inoue forproviding the framework for such interesting discussions
Finally thank you to the active and friendly CouchDB community - JChris Anderson Volker Mische Jason Smith and many others for your helpon the couchdb-users mailing list and to Damien Katz for starting the wholething
Contents
1 Introduction 311 Motivation 312 Does lsquoone sizersquo still lsquofit allrsquo 313 Organisation 4
2 Background 521 The Relational Model 522 Relational Database Management Systems 623 Normalisation 624 SQL 725 Transactional Guarantees 726 The lsquoNoSqlrsquo Movement 827 lsquoNoSqlrsquo databases at large scale 928 Brewerrsquos lsquoCAPrsquo Theorem 1029 Reducing the Impedance Mismatch 11210 Benefits of lsquoNoSqlrsquo 11211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo 12212 The Activity Mapping Project 12213 Raising Awareness of British Council Impact in the UK 12214 Using CouchDB to Serve Mapping Data 14
3 Introduction to CouchDB 1631 lsquoOf the Webrsquo 1632 Some History 1733 Document-Oriented 1734 Erlang 1835 How Ubuntu uses CouchDB 1936 Desktop Couch Python and Quickly 19
4 Using CouchDB 2141 Motivation 2142 Hosted CouchDB Service Providers 2143 Installing CouchDB 22
1
CONTENTS 2
44 Inserting a Document into a CouchDB Database 2245 Deleting a Document 2646 Updating a Document 2647 Adding Attachments 2648 Replication 2749 Querying a CouchDB Database using Map-Reduce 27
5 Serving HTML from CouchDB 3151 Bulk upload of JSON documents 3152 A CouchDB Design Document 3353 A more complex lsquoshowrsquo function 3554 Views and Lists 3655 Lessons Learned 38
6 Serving GeoRSS using CouchApp 3961 Introduction to CouchApp 3962 Sofa - a Blogging Application 3963 Developing with CouchApp 4164 Deployment 4465 Using templates 44
7 Critical Assessment and Conclusion 4771 Why might a developer choose CouchDB 4772 CouchDBrsquos challenges 4873 Conclusion 49
A Design Document d1 50
B GeoRSS on Sofa 52
C georssjs 55
D georsshtml 57
E Bash Script to Upload Country Flag Files 59
Section 1
Introduction
The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform
11 Motivation
The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments
The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products
These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace
12 Does lsquoone sizersquo still lsquofit allrsquo
One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures
1Pronounced lsquoNo-See-Quelrsquo
3
SECTION 1 INTRODUCTION 4
ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements
[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo
Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage
13 Organisation
The sections are organised in the following way
bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project
bull in Section 3 I introduce the open-source document-oriented databaseCouchDB
bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface
bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code
bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps
bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project
Section 2
Background
In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model
In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems
21 The Relational Model
The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]
One description [54] of Coddrsquos relational model reads as follows
ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo
A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns
The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]
5
SECTION 2 BACKGROUND 6
22 Relational Database Management Systems
The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model
These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)
23 Normalisation
A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element
of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]
The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values
Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records
Name Nationality
Bob AustraliaJane Sweden
All is well until the system encounters a staff member with multiplenationalities
Name Nationality
Bob AustraliaJane SwedenAlice United Kingdom South Africa
This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys
While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally
SECTION 2 BACKGROUND 7
be expensive as database developers need to be brought in to make thenecessary amendments
In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows
name Alice
nationality [
United Kingdom
South Africa
]
This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems
The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3
24 SQL
A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]
To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed
SELECT Nationality COUNT() FROM Staff GROUP BY Nationality
For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality
In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database
25 Transactional Guarantees
A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most
SECTION 2 BACKGROUND 8
systems rely on data access (read and write) locking to provide ACID capa-bilities [50]
The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]
For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times
The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems
However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)
Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile
26 The lsquoNoSqlrsquo Movement
lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1
lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas
bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words
1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Abstract
This project investigates an innovative database technology CouchDB byway of assessing its use in applications derived from the British CouncilActivity Mapping project The projectrsquos aim is to examine the advantagesand disadvantages of CouchDB from the perspective of a web applicationsdeveloper
CouchDB combines a web server with a data storage mechanism Datais stored in the form of denormalised documents and queried through map-reduce functions which result in the creation of indexed views
The project considers the suitability of CouchDB as a data store andweb development platform in support of an existing relational database ap-plication with an assessment of the strengths of both approaches
Alles im richtigen Maszlig
Acknowledgments
With thanks to my supervisor Nigel Martin for timely patient and good-humoured advice - you really helped me to stay on track
Thanks also to my colleagues and ex-colleagues at the British Councilin particular Terry Pyle Kshipra Singhvi Phil Street Spero BlassoplesMichael Sadler Roger Moran and Masoud Ahanchian
Thank you Su Peneycad for your encouragement and wonderful skill atspotting typos
I had the great fortune to be doing this project during the summer of2010 when the (first) lsquoNoSql Summerrsquo was in full swing Thanks to theorganisers of the London meetups Neil Robbins and Makoto Inoue forproviding the framework for such interesting discussions
Finally thank you to the active and friendly CouchDB community - JChris Anderson Volker Mische Jason Smith and many others for your helpon the couchdb-users mailing list and to Damien Katz for starting the wholething
Contents
1 Introduction 311 Motivation 312 Does lsquoone sizersquo still lsquofit allrsquo 313 Organisation 4
2 Background 521 The Relational Model 522 Relational Database Management Systems 623 Normalisation 624 SQL 725 Transactional Guarantees 726 The lsquoNoSqlrsquo Movement 827 lsquoNoSqlrsquo databases at large scale 928 Brewerrsquos lsquoCAPrsquo Theorem 1029 Reducing the Impedance Mismatch 11210 Benefits of lsquoNoSqlrsquo 11211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo 12212 The Activity Mapping Project 12213 Raising Awareness of British Council Impact in the UK 12214 Using CouchDB to Serve Mapping Data 14
3 Introduction to CouchDB 1631 lsquoOf the Webrsquo 1632 Some History 1733 Document-Oriented 1734 Erlang 1835 How Ubuntu uses CouchDB 1936 Desktop Couch Python and Quickly 19
4 Using CouchDB 2141 Motivation 2142 Hosted CouchDB Service Providers 2143 Installing CouchDB 22
1
CONTENTS 2
44 Inserting a Document into a CouchDB Database 2245 Deleting a Document 2646 Updating a Document 2647 Adding Attachments 2648 Replication 2749 Querying a CouchDB Database using Map-Reduce 27
5 Serving HTML from CouchDB 3151 Bulk upload of JSON documents 3152 A CouchDB Design Document 3353 A more complex lsquoshowrsquo function 3554 Views and Lists 3655 Lessons Learned 38
6 Serving GeoRSS using CouchApp 3961 Introduction to CouchApp 3962 Sofa - a Blogging Application 3963 Developing with CouchApp 4164 Deployment 4465 Using templates 44
7 Critical Assessment and Conclusion 4771 Why might a developer choose CouchDB 4772 CouchDBrsquos challenges 4873 Conclusion 49
A Design Document d1 50
B GeoRSS on Sofa 52
C georssjs 55
D georsshtml 57
E Bash Script to Upload Country Flag Files 59
Section 1
Introduction
The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform
11 Motivation
The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments
The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products
These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace
12 Does lsquoone sizersquo still lsquofit allrsquo
One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures
1Pronounced lsquoNo-See-Quelrsquo
3
SECTION 1 INTRODUCTION 4
ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements
[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo
Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage
13 Organisation
The sections are organised in the following way
bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project
bull in Section 3 I introduce the open-source document-oriented databaseCouchDB
bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface
bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code
bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps
bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project
Section 2
Background
In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model
In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems
21 The Relational Model
The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]
One description [54] of Coddrsquos relational model reads as follows
ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo
A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns
The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]
5
SECTION 2 BACKGROUND 6
22 Relational Database Management Systems
The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model
These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)
23 Normalisation
A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element
of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]
The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values
Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records
Name Nationality
Bob AustraliaJane Sweden
All is well until the system encounters a staff member with multiplenationalities
Name Nationality
Bob AustraliaJane SwedenAlice United Kingdom South Africa
This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys
While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally
SECTION 2 BACKGROUND 7
be expensive as database developers need to be brought in to make thenecessary amendments
In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows
name Alice
nationality [
United Kingdom
South Africa
]
This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems
The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3
24 SQL
A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]
To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed
SELECT Nationality COUNT() FROM Staff GROUP BY Nationality
For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality
In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database
25 Transactional Guarantees
A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most
SECTION 2 BACKGROUND 8
systems rely on data access (read and write) locking to provide ACID capa-bilities [50]
The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]
For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times
The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems
However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)
Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile
26 The lsquoNoSqlrsquo Movement
lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1
lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas
bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words
1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Alles im richtigen Maszlig
Acknowledgments
With thanks to my supervisor Nigel Martin for timely patient and good-humoured advice - you really helped me to stay on track
Thanks also to my colleagues and ex-colleagues at the British Councilin particular Terry Pyle Kshipra Singhvi Phil Street Spero BlassoplesMichael Sadler Roger Moran and Masoud Ahanchian
Thank you Su Peneycad for your encouragement and wonderful skill atspotting typos
I had the great fortune to be doing this project during the summer of2010 when the (first) lsquoNoSql Summerrsquo was in full swing Thanks to theorganisers of the London meetups Neil Robbins and Makoto Inoue forproviding the framework for such interesting discussions
Finally thank you to the active and friendly CouchDB community - JChris Anderson Volker Mische Jason Smith and many others for your helpon the couchdb-users mailing list and to Damien Katz for starting the wholething
Contents
1 Introduction 311 Motivation 312 Does lsquoone sizersquo still lsquofit allrsquo 313 Organisation 4
2 Background 521 The Relational Model 522 Relational Database Management Systems 623 Normalisation 624 SQL 725 Transactional Guarantees 726 The lsquoNoSqlrsquo Movement 827 lsquoNoSqlrsquo databases at large scale 928 Brewerrsquos lsquoCAPrsquo Theorem 1029 Reducing the Impedance Mismatch 11210 Benefits of lsquoNoSqlrsquo 11211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo 12212 The Activity Mapping Project 12213 Raising Awareness of British Council Impact in the UK 12214 Using CouchDB to Serve Mapping Data 14
3 Introduction to CouchDB 1631 lsquoOf the Webrsquo 1632 Some History 1733 Document-Oriented 1734 Erlang 1835 How Ubuntu uses CouchDB 1936 Desktop Couch Python and Quickly 19
4 Using CouchDB 2141 Motivation 2142 Hosted CouchDB Service Providers 2143 Installing CouchDB 22
1
CONTENTS 2
44 Inserting a Document into a CouchDB Database 2245 Deleting a Document 2646 Updating a Document 2647 Adding Attachments 2648 Replication 2749 Querying a CouchDB Database using Map-Reduce 27
5 Serving HTML from CouchDB 3151 Bulk upload of JSON documents 3152 A CouchDB Design Document 3353 A more complex lsquoshowrsquo function 3554 Views and Lists 3655 Lessons Learned 38
6 Serving GeoRSS using CouchApp 3961 Introduction to CouchApp 3962 Sofa - a Blogging Application 3963 Developing with CouchApp 4164 Deployment 4465 Using templates 44
7 Critical Assessment and Conclusion 4771 Why might a developer choose CouchDB 4772 CouchDBrsquos challenges 4873 Conclusion 49
A Design Document d1 50
B GeoRSS on Sofa 52
C georssjs 55
D georsshtml 57
E Bash Script to Upload Country Flag Files 59
Section 1
Introduction
The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform
11 Motivation
The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments
The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products
These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace
12 Does lsquoone sizersquo still lsquofit allrsquo
One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures
1Pronounced lsquoNo-See-Quelrsquo
3
SECTION 1 INTRODUCTION 4
ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements
[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo
Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage
13 Organisation
The sections are organised in the following way
bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project
bull in Section 3 I introduce the open-source document-oriented databaseCouchDB
bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface
bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code
bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps
bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project
Section 2
Background
In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model
In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems
21 The Relational Model
The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]
One description [54] of Coddrsquos relational model reads as follows
ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo
A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns
The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]
5
SECTION 2 BACKGROUND 6
22 Relational Database Management Systems
The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model
These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)
23 Normalisation
A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element
of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]
The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values
Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records
Name Nationality
Bob AustraliaJane Sweden
All is well until the system encounters a staff member with multiplenationalities
Name Nationality
Bob AustraliaJane SwedenAlice United Kingdom South Africa
This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys
While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally
SECTION 2 BACKGROUND 7
be expensive as database developers need to be brought in to make thenecessary amendments
In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows
name Alice
nationality [
United Kingdom
South Africa
]
This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems
The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3
24 SQL
A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]
To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed
SELECT Nationality COUNT() FROM Staff GROUP BY Nationality
For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality
In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database
25 Transactional Guarantees
A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most
SECTION 2 BACKGROUND 8
systems rely on data access (read and write) locking to provide ACID capa-bilities [50]
The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]
For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times
The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems
However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)
Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile
26 The lsquoNoSqlrsquo Movement
lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1
lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas
bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words
1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Acknowledgments
With thanks to my supervisor Nigel Martin for timely patient and good-humoured advice - you really helped me to stay on track
Thanks also to my colleagues and ex-colleagues at the British Councilin particular Terry Pyle Kshipra Singhvi Phil Street Spero BlassoplesMichael Sadler Roger Moran and Masoud Ahanchian
Thank you Su Peneycad for your encouragement and wonderful skill atspotting typos
I had the great fortune to be doing this project during the summer of2010 when the (first) lsquoNoSql Summerrsquo was in full swing Thanks to theorganisers of the London meetups Neil Robbins and Makoto Inoue forproviding the framework for such interesting discussions
Finally thank you to the active and friendly CouchDB community - JChris Anderson Volker Mische Jason Smith and many others for your helpon the couchdb-users mailing list and to Damien Katz for starting the wholething
Contents
1 Introduction 311 Motivation 312 Does lsquoone sizersquo still lsquofit allrsquo 313 Organisation 4
2 Background 521 The Relational Model 522 Relational Database Management Systems 623 Normalisation 624 SQL 725 Transactional Guarantees 726 The lsquoNoSqlrsquo Movement 827 lsquoNoSqlrsquo databases at large scale 928 Brewerrsquos lsquoCAPrsquo Theorem 1029 Reducing the Impedance Mismatch 11210 Benefits of lsquoNoSqlrsquo 11211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo 12212 The Activity Mapping Project 12213 Raising Awareness of British Council Impact in the UK 12214 Using CouchDB to Serve Mapping Data 14
3 Introduction to CouchDB 1631 lsquoOf the Webrsquo 1632 Some History 1733 Document-Oriented 1734 Erlang 1835 How Ubuntu uses CouchDB 1936 Desktop Couch Python and Quickly 19
4 Using CouchDB 2141 Motivation 2142 Hosted CouchDB Service Providers 2143 Installing CouchDB 22
1
CONTENTS 2
44 Inserting a Document into a CouchDB Database 2245 Deleting a Document 2646 Updating a Document 2647 Adding Attachments 2648 Replication 2749 Querying a CouchDB Database using Map-Reduce 27
5 Serving HTML from CouchDB 3151 Bulk upload of JSON documents 3152 A CouchDB Design Document 3353 A more complex lsquoshowrsquo function 3554 Views and Lists 3655 Lessons Learned 38
6 Serving GeoRSS using CouchApp 3961 Introduction to CouchApp 3962 Sofa - a Blogging Application 3963 Developing with CouchApp 4164 Deployment 4465 Using templates 44
7 Critical Assessment and Conclusion 4771 Why might a developer choose CouchDB 4772 CouchDBrsquos challenges 4873 Conclusion 49
A Design Document d1 50
B GeoRSS on Sofa 52
C georssjs 55
D georsshtml 57
E Bash Script to Upload Country Flag Files 59
Section 1
Introduction
The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform
11 Motivation
The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments
The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products
These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace
12 Does lsquoone sizersquo still lsquofit allrsquo
One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures
1Pronounced lsquoNo-See-Quelrsquo
3
SECTION 1 INTRODUCTION 4
ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements
[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo
Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage
13 Organisation
The sections are organised in the following way
bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project
bull in Section 3 I introduce the open-source document-oriented databaseCouchDB
bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface
bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code
bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps
bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project
Section 2
Background
In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model
In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems
21 The Relational Model
The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]
One description [54] of Coddrsquos relational model reads as follows
ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo
A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns
The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]
5
SECTION 2 BACKGROUND 6
22 Relational Database Management Systems
The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model
These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)
23 Normalisation
A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element
of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]
The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values
Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records
Name Nationality
Bob AustraliaJane Sweden
All is well until the system encounters a staff member with multiplenationalities
Name Nationality
Bob AustraliaJane SwedenAlice United Kingdom South Africa
This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys
While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally
SECTION 2 BACKGROUND 7
be expensive as database developers need to be brought in to make thenecessary amendments
In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows
name Alice
nationality [
United Kingdom
South Africa
]
This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems
The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3
24 SQL
A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]
To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed
SELECT Nationality COUNT() FROM Staff GROUP BY Nationality
For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality
In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database
25 Transactional Guarantees
A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most
SECTION 2 BACKGROUND 8
systems rely on data access (read and write) locking to provide ACID capa-bilities [50]
The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]
For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times
The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems
However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)
Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile
26 The lsquoNoSqlrsquo Movement
lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1
lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas
bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words
1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Contents
1 Introduction 311 Motivation 312 Does lsquoone sizersquo still lsquofit allrsquo 313 Organisation 4
2 Background 521 The Relational Model 522 Relational Database Management Systems 623 Normalisation 624 SQL 725 Transactional Guarantees 726 The lsquoNoSqlrsquo Movement 827 lsquoNoSqlrsquo databases at large scale 928 Brewerrsquos lsquoCAPrsquo Theorem 1029 Reducing the Impedance Mismatch 11210 Benefits of lsquoNoSqlrsquo 11211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo 12212 The Activity Mapping Project 12213 Raising Awareness of British Council Impact in the UK 12214 Using CouchDB to Serve Mapping Data 14
3 Introduction to CouchDB 1631 lsquoOf the Webrsquo 1632 Some History 1733 Document-Oriented 1734 Erlang 1835 How Ubuntu uses CouchDB 1936 Desktop Couch Python and Quickly 19
4 Using CouchDB 2141 Motivation 2142 Hosted CouchDB Service Providers 2143 Installing CouchDB 22
1
CONTENTS 2
44 Inserting a Document into a CouchDB Database 2245 Deleting a Document 2646 Updating a Document 2647 Adding Attachments 2648 Replication 2749 Querying a CouchDB Database using Map-Reduce 27
5 Serving HTML from CouchDB 3151 Bulk upload of JSON documents 3152 A CouchDB Design Document 3353 A more complex lsquoshowrsquo function 3554 Views and Lists 3655 Lessons Learned 38
6 Serving GeoRSS using CouchApp 3961 Introduction to CouchApp 3962 Sofa - a Blogging Application 3963 Developing with CouchApp 4164 Deployment 4465 Using templates 44
7 Critical Assessment and Conclusion 4771 Why might a developer choose CouchDB 4772 CouchDBrsquos challenges 4873 Conclusion 49
A Design Document d1 50
B GeoRSS on Sofa 52
C georssjs 55
D georsshtml 57
E Bash Script to Upload Country Flag Files 59
Section 1
Introduction
The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform
11 Motivation
The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments
The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products
These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace
12 Does lsquoone sizersquo still lsquofit allrsquo
One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures
1Pronounced lsquoNo-See-Quelrsquo
3
SECTION 1 INTRODUCTION 4
ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements
[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo
Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage
13 Organisation
The sections are organised in the following way
bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project
bull in Section 3 I introduce the open-source document-oriented databaseCouchDB
bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface
bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code
bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps
bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project
Section 2
Background
In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model
In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems
21 The Relational Model
The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]
One description [54] of Coddrsquos relational model reads as follows
ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo
A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns
The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]
5
SECTION 2 BACKGROUND 6
22 Relational Database Management Systems
The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model
These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)
23 Normalisation
A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element
of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]
The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values
Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records
Name Nationality
Bob AustraliaJane Sweden
All is well until the system encounters a staff member with multiplenationalities
Name Nationality
Bob AustraliaJane SwedenAlice United Kingdom South Africa
This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys
While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally
SECTION 2 BACKGROUND 7
be expensive as database developers need to be brought in to make thenecessary amendments
In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows
name Alice
nationality [
United Kingdom
South Africa
]
This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems
The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3
24 SQL
A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]
To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed
SELECT Nationality COUNT() FROM Staff GROUP BY Nationality
For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality
In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database
25 Transactional Guarantees
A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most
SECTION 2 BACKGROUND 8
systems rely on data access (read and write) locking to provide ACID capa-bilities [50]
The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]
For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times
The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems
However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)
Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile
26 The lsquoNoSqlrsquo Movement
lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1
lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas
bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words
1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
CONTENTS 2
44 Inserting a Document into a CouchDB Database 2245 Deleting a Document 2646 Updating a Document 2647 Adding Attachments 2648 Replication 2749 Querying a CouchDB Database using Map-Reduce 27
5 Serving HTML from CouchDB 3151 Bulk upload of JSON documents 3152 A CouchDB Design Document 3353 A more complex lsquoshowrsquo function 3554 Views and Lists 3655 Lessons Learned 38
6 Serving GeoRSS using CouchApp 3961 Introduction to CouchApp 3962 Sofa - a Blogging Application 3963 Developing with CouchApp 4164 Deployment 4465 Using templates 44
7 Critical Assessment and Conclusion 4771 Why might a developer choose CouchDB 4772 CouchDBrsquos challenges 4873 Conclusion 49
A Design Document d1 50
B GeoRSS on Sofa 52
C georssjs 55
D georsshtml 57
E Bash Script to Upload Country Flag Files 59
Section 1
Introduction
The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform
11 Motivation
The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments
The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products
These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace
12 Does lsquoone sizersquo still lsquofit allrsquo
One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures
1Pronounced lsquoNo-See-Quelrsquo
3
SECTION 1 INTRODUCTION 4
ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements
[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo
Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage
13 Organisation
The sections are organised in the following way
bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project
bull in Section 3 I introduce the open-source document-oriented databaseCouchDB
bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface
bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code
bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps
bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project
Section 2
Background
In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model
In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems
21 The Relational Model
The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]
One description [54] of Coddrsquos relational model reads as follows
ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo
A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns
The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]
5
SECTION 2 BACKGROUND 6
22 Relational Database Management Systems
The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model
These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)
23 Normalisation
A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element
of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]
The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values
Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records
Name Nationality
Bob AustraliaJane Sweden
All is well until the system encounters a staff member with multiplenationalities
Name Nationality
Bob AustraliaJane SwedenAlice United Kingdom South Africa
This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys
While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally
SECTION 2 BACKGROUND 7
be expensive as database developers need to be brought in to make thenecessary amendments
In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows
name Alice
nationality [
United Kingdom
South Africa
]
This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems
The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3
24 SQL
A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]
To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed
SELECT Nationality COUNT() FROM Staff GROUP BY Nationality
For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality
In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database
25 Transactional Guarantees
A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most
SECTION 2 BACKGROUND 8
systems rely on data access (read and write) locking to provide ACID capa-bilities [50]
The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]
For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times
The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems
However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)
Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile
26 The lsquoNoSqlrsquo Movement
lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1
lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas
bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words
1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Section 1
Introduction
The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform
11 Motivation
The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments
The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products
These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace
12 Does lsquoone sizersquo still lsquofit allrsquo
One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures
1Pronounced lsquoNo-See-Quelrsquo
3
SECTION 1 INTRODUCTION 4
ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements
[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo
Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage
13 Organisation
The sections are organised in the following way
bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project
bull in Section 3 I introduce the open-source document-oriented databaseCouchDB
bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface
bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code
bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps
bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project
Section 2
Background
In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model
In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems
21 The Relational Model
The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]
One description [54] of Coddrsquos relational model reads as follows
ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo
A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns
The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]
5
SECTION 2 BACKGROUND 6
22 Relational Database Management Systems
The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model
These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)
23 Normalisation
A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element
of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]
The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values
Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records
Name Nationality
Bob AustraliaJane Sweden
All is well until the system encounters a staff member with multiplenationalities
Name Nationality
Bob AustraliaJane SwedenAlice United Kingdom South Africa
This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys
While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally
SECTION 2 BACKGROUND 7
be expensive as database developers need to be brought in to make thenecessary amendments
In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows
name Alice
nationality [
United Kingdom
South Africa
]
This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems
The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3
24 SQL
A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]
To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed
SELECT Nationality COUNT() FROM Staff GROUP BY Nationality
For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality
In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database
25 Transactional Guarantees
A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most
SECTION 2 BACKGROUND 8
systems rely on data access (read and write) locking to provide ACID capa-bilities [50]
The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]
For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times
The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems
However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)
Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile
26 The lsquoNoSqlrsquo Movement
lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1
lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas
bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words
1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 1 INTRODUCTION 4
ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements
[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo
Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage
13 Organisation
The sections are organised in the following way
bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project
bull in Section 3 I introduce the open-source document-oriented databaseCouchDB
bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface
bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code
bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps
bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project
Section 2
Background
In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model
In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems
21 The Relational Model
The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]
One description [54] of Coddrsquos relational model reads as follows
ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo
A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns
The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]
5
SECTION 2 BACKGROUND 6
22 Relational Database Management Systems
The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model
These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)
23 Normalisation
A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element
of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]
The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values
Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records
Name Nationality
Bob AustraliaJane Sweden
All is well until the system encounters a staff member with multiplenationalities
Name Nationality
Bob AustraliaJane SwedenAlice United Kingdom South Africa
This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys
While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally
SECTION 2 BACKGROUND 7
be expensive as database developers need to be brought in to make thenecessary amendments
In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows
name Alice
nationality [
United Kingdom
South Africa
]
This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems
The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3
24 SQL
A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]
To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed
SELECT Nationality COUNT() FROM Staff GROUP BY Nationality
For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality
In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database
25 Transactional Guarantees
A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most
SECTION 2 BACKGROUND 8
systems rely on data access (read and write) locking to provide ACID capa-bilities [50]
The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]
For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times
The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems
However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)
Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile
26 The lsquoNoSqlrsquo Movement
lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1
lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas
bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words
1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Section 2
Background
In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model
In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems
21 The Relational Model
The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]
One description [54] of Coddrsquos relational model reads as follows
ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo
A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns
The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]
5
SECTION 2 BACKGROUND 6
22 Relational Database Management Systems
The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model
These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)
23 Normalisation
A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element
of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]
The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values
Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records
Name Nationality
Bob AustraliaJane Sweden
All is well until the system encounters a staff member with multiplenationalities
Name Nationality
Bob AustraliaJane SwedenAlice United Kingdom South Africa
This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys
While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally
SECTION 2 BACKGROUND 7
be expensive as database developers need to be brought in to make thenecessary amendments
In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows
name Alice
nationality [
United Kingdom
South Africa
]
This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems
The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3
24 SQL
A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]
To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed
SELECT Nationality COUNT() FROM Staff GROUP BY Nationality
For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality
In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database
25 Transactional Guarantees
A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most
SECTION 2 BACKGROUND 8
systems rely on data access (read and write) locking to provide ACID capa-bilities [50]
The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]
For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times
The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems
However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)
Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile
26 The lsquoNoSqlrsquo Movement
lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1
lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas
bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words
1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 2 BACKGROUND 6
22 Relational Database Management Systems
The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model
These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)
23 Normalisation
A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element
of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]
The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values
Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records
Name Nationality
Bob AustraliaJane Sweden
All is well until the system encounters a staff member with multiplenationalities
Name Nationality
Bob AustraliaJane SwedenAlice United Kingdom South Africa
This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys
While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally
SECTION 2 BACKGROUND 7
be expensive as database developers need to be brought in to make thenecessary amendments
In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows
name Alice
nationality [
United Kingdom
South Africa
]
This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems
The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3
24 SQL
A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]
To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed
SELECT Nationality COUNT() FROM Staff GROUP BY Nationality
For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality
In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database
25 Transactional Guarantees
A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most
SECTION 2 BACKGROUND 8
systems rely on data access (read and write) locking to provide ACID capa-bilities [50]
The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]
For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times
The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems
However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)
Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile
26 The lsquoNoSqlrsquo Movement
lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1
lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas
bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words
1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 2 BACKGROUND 7
be expensive as database developers need to be brought in to make thenecessary amendments
In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows
name Alice
nationality [
United Kingdom
South Africa
]
This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems
The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3
24 SQL
A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]
To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed
SELECT Nationality COUNT() FROM Staff GROUP BY Nationality
For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality
In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database
25 Transactional Guarantees
A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most
SECTION 2 BACKGROUND 8
systems rely on data access (read and write) locking to provide ACID capa-bilities [50]
The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]
For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times
The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems
However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)
Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile
26 The lsquoNoSqlrsquo Movement
lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1
lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas
bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words
1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 2 BACKGROUND 8
systems rely on data access (read and write) locking to provide ACID capa-bilities [50]
The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]
For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times
The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems
However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)
Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile
26 The lsquoNoSqlrsquo Movement
lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1
lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas
bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words
1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 2 BACKGROUND 9
data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema
bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes
27 lsquoNoSqlrsquo databases at large scale
Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]
The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form
To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle
In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state
Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world
In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows
ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 2 BACKGROUND 10
become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo
28 Brewerrsquos lsquoCAPrsquo Theorem
The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees
bull Consistency
bull Availability
bull Partition tolerance
Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying
This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions
In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 2 BACKGROUND 11
29 Reducing the Impedance Mismatch
The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels
However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -
effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model
A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]
210 Benefits of lsquoNoSqlrsquo
As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory
bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data
bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware
bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer
2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 2 BACKGROUND 12
211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo
It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way
(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])
Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL
212 The Activity Mapping Project
The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]
The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]
213 Raising Awareness of British Council Impactin the UK
The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 2 BACKGROUND 13
Figure 21 British Council project data displayed on Bing Maps
The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information
The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK
Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 2 BACKGROUND 14
Figure 22 British Council data on Google Earth
The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats
214 Using CouchDB to Serve Mapping Data
Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically
Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer
The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc
This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 2 BACKGROUND 15
for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3
3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Section 3
Introduction to CouchDB
The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]
31 lsquoOf the Webrsquo
The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]
The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project
An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct
ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it
16
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 3 INTRODUCTION TO COUCHDB 17
opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]
Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code
32 Some History
CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications
A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])
33 Document-Oriented
CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]
The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
1Operating System
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 3 INTRODUCTION TO COUCHDB 18
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience
The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)
In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server
34 Erlang
Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]
Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this
ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 3 INTRODUCTION TO COUCHDB 19
touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]
35 How Ubuntu uses CouchDB
The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]
The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files
Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine
In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]
36 Desktop Couch Python and Quickly
Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications
Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 3 INTRODUCTION TO COUCHDB 20
Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Section 4
Using CouchDB
This section contains a brief tutorial introduction to CouchDB
41 Motivation
My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])
As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)
A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group
The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb
HTTP_Document_API
42 Hosted CouchDB Service Providers
I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge
Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password
21
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 4 USING COUCHDB 22
43 Installing CouchDB
In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101
(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)
44 Inserting a Document into a CouchDB Database
As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server
For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB
To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]
To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
(username and password are placeholders for my real username and pass-word)
To delete a CouchDB database we issue the statement with the HTTPDELETE verb
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)
In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York
If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows
1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml
2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 4 USING COUCHDB 23
curl -H Content-Typeapplicationjson
-X POST httpusernamepasswordmickcouchonecomaddressbook
-d john-smithjson
cURL syntax is explained here [44] Briefly
bull -H introduces the HTTP Header
bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request
bull -d introduces the data to be sent
bull denotes that what follows is a file name
All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows
_id 760cd53c55a93497067f90d6242fc25e
_rev 1-91bce055fc8db86480400321079f0834
firstName John
lastName Smith
age 25
address
streetAddress 21 2nd Street
city New York
state NY
postalCode 10021
phoneNumber [
type home number 212 555-1234
type fax number 646 555-4567
]
This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON
To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 4 USING COUCHDB 24
Figure 41 JSONLint
The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook
Figure 42 Futon
In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters
For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 4 USING COUCHDB 25
To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows
curl -H Content-Typeapplicationjson
-X PUT
httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia
-d british-council-zambiajson
This is the resulting document in CouchDB
_id british-council-zambia
_rev 1-da7bcd810c608d6fbcb9ce92e9ade343
company British Council Zambia
address
streetAddress Heroes PlacenCairo RoadnPO Box 34571
city Lusaka
country Zambia
As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language
Figure 43 Document viewed using Futon
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 4 USING COUCHDB 26
45 Deleting a Document
When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )
By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted
The following command will delete the existing British Council Zambiadocument from the database
curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343
46 Updating a Document
Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document
As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST
The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json
curl -X PUT httpusernamepasswordmickcouchonecomaddressbook
760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343
-d john-smith-v2json
47 Adding Attachments
CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3
The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record
curl -H Content-Typeimagepng -X PUT
httpusernamepasswordmickcouchonecomaddressbook
british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af
--data-binary zmpng
3Many thanks to Mark James for creating these files and making them available forfree use
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 4 USING COUCHDB 27
The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag
48 Replication
One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances
Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode
If the addressbook database does not already exist on the local machinewe can create it as follows
curl -X PUT http1270015984addressbook
To replicate the addressbook database to the local machine we use thefollowing command
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbookrsquo
Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary
curl -H Content-Typeapplicationjson
-X POST http1270015984_replicate
-d sourcehttpusernamepasswordmickcouchonecomaddressbook
targethttp1270015984addressbook
As a result the replica of the database is now available on the localmachine
49 Querying a CouchDB Database using Map-Reduce
So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database
Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)
The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 4 USING COUCHDB 28
commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required
In a blog posting for this project httpklena02wordpresscom
20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function
The data relates to the 2005 UK election and was taken from http
wwwelectoralcalculuscouk
Figure 44 election-2005 database
(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)
Here is an example of a document stored in the database with the resultsfrom a single constituency
[httplocalhost5984election-2005Aberavon]
_idAberavon
_rev2-4defd6a39cb379b4480a72ddb0ab2ee5
mpHywel Francis
electorate51079
con3062
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 4 USING COUCHDB 29
lab18077
lib4138
pc3546
oth1278
CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon
The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data
The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data
The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function
The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)
[httplocalhost5984election-2005_designcon]
_id_designcon
_rev6-95b012f051cee87dc7a36d73cef8f2c8
languagejavascript
views
Conservative Votes
mapfunction(doc) n emit(doccon doc_id)n
Conservative Votes Total
mapfunction(doc) n emit(null doccon)n
reducefunction(keys values) n return sum(values)n
The sum of Conservative votes (8782198) is returned as the result ofthe following request
[httplocalhost5984election-2005_designcon_view
Conservative20Votes20Total]
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 4 USING COUCHDB 30
rows[keynullvalue8782198]
It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000
[httplocalhost5984election-2005_designcon_view
Conservative20VotesstartKey=1000ampendKey=2000]
Figure 45 WHERE clause in CouchDB
This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions
The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated
As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Section 5
Serving HTML fromCouchDB
This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development
There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own
As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo
A design document in CouchDB is a document which stores applicationcode rather than data
51 Bulk upload of JSON documents
To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB
The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]
I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux
31
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 5 SERVING HTML FROM COUCHDB 32
Figure 51 Folder containing JSON files
To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows
The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)
binbash
host=http1270015984
host=httpusernamepasswordmickcouchonecom
database=universities
folder=universities
fileextension=json
FILES=$folder
create the database
curl -X PUT $host$database
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e s$folderg)
echo $filename
docname=$(echo $filename | sed -e s$fileextensiong)
echo $docname
url=$host$database$docname
echo $url
put the document into CouchDB
echo curl -X PUT $url -d $filepath
curl -X PUT $url -d $filepath
done
Note that we are using HTTP PUT with cURL to lsquoputrsquo the document
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 5 SERVING HTML FROM COUCHDB 33
at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)
To put a single document to the web the command used would be asfollows
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
Aberystwyth20University -d universitiesAberystwyth20Universityjson
Figure 52 Documents in CouchDB
The resulting universities database may be viewed online at http
mickcouchonecom_utilsdatabasehtmluniversities
52 A CouchDB Design Document
As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]
For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML
The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output
_id _designd1
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 5 SERVING HTML FROM COUCHDB 34
shows
s1 function(docreq)
return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +
rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +
rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo
Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers
to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL
Having saved the design document as d1json it can be uploaded toCouchDB in the usual way
curl -X PUT httpusernamepasswordmickcouchonecomuniversities
_designd1 -d d1json
To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University
Figure 53 Document rendered in HTML
This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files
The URL can be read as follows
bull httpmickcouchonecom is the hosted instance of CouchDB
bull universities is the database
bull _designd1 is the design document
bull _shows1 is the lsquoshowrsquo function that returns an HTML string
bull Aston20University is the id of the document to be rendered
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 5 SERVING HTML FROM COUCHDB 35
53 A more complex lsquoshowrsquo function
The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location
Figure 54 Document data rendered using Google Maps
This page is available at httpmickcouchonecomuniversities
_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the
HTML for the map is map1HTML for Google Maps is based on the code from the Google Map
JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml
The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A
It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid
While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 5 SERVING HTML FROM COUCHDB 36
Figure 55 Documents may be edited directly using Futon
54 Views and Lists
As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions
In a CouchDB design document such queries are listed in the views
section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id
field for each document in the collection
_id _designd1
_rev 3-98e327097d3d7ed5a9454800c25d9ff9
shows
lists
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 5 SERVING HTML FROM COUCHDB 37
The output from the lsquoviewrsquo function is here httpmickcouchone
comuniversities_designd1_viewv1
Figure 56 Output from a simple lsquoviewrsquo returning document id values
A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document
bull v1 is a very simple View simply emitting the document ids
bull l1 is a very simple List that shows a hyperlink for each document id
We augment the design document with a lsquolistrsquo function as follows
_id _designd1
_rev 5-d6da34f482e0ee1a711ed302b9b08bb1
shows
lists
l1 function(head req)
var row
start( headers Content-Type texthtml )
while(row = getRow())
send(rsquoltpgtlta href=httpmickcouchonecomuniversities
_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue
+ rsquoltagtltpgtrsquo)
views
v1
map function(doc)emit(doc_id doc_id)
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 5 SERVING HTML FROM COUCHDB 38
Note the built-in functions start and send
bull start is executed once at the beginning of the function
bull send is executed once for each row in the dataset
At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick
couchonecomuniversities_designd1_listl1v1
Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function
55 Lessons Learned
The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex
Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function
As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Section 6
Serving GeoRSS usingCouchApp
This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS
GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps
61 Introduction to CouchApp
As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents
CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]
I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg
The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion
62 Sofa - a Blogging Application
The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa
39
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 6 SERVING GEORSS USING COUCHAPP 40
I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data
Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS
I describe the changes I made in more detail in Appendix B
Figure 61 Sofa amended to accept latitude and longitude
The instance of Sofa that I amended is available here httpmick
couchonecomblog_designsofa_listindexrecent-posts
The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom
The GeoRSS feed on Google Maps is available here httpmaps
googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick
couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts
3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95
677068ampsspn=3554717656513672ampie=UTF8ampz=3
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 6 SERVING GEORSS USING COUCHAPP 41
Figure 62 Sofa GeoRSS feed on Google Maps
63 Developing with CouchApp
In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa
As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file
As an example for the British Council Activity Mapping Project wewant to create a design document bc
To do this we navigate to the folder where couchapp is installed andrun the following command
couchapp generate bc
On my Ubuntu Linux PC the full command line looks like this
michaeldell~couchapp$ couchapp generate bc
This command creates a folder structure which matches the eventualstructure of the design document that we want to create
bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database
bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML
bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 6 SERVING GEORSS USING COUCHAPP 42
Figure 63 Files generated by CouchApp
The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows
bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB
bull I added a templates folder This folder contains the georsshtml
template that is used to create the GeoRSS feed
bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below
To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5
To generate a lsquoShowrsquo function s1 we navigate to the bc folder
michaeldell~couchapp$ cd bc
and run the following command
michaeldell~couchappbc$ couchapp generate show s1
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 6 SERVING GEORSS USING COUCHAPP 43
couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML
Figure 64 s1js in a file generated by CouchApp
Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder
named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the
databaseAs v1 is a very simple View with a map function but no reduce function
we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)
Figure 65 Remove the reducejs file if it is not needed
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 6 SERVING GEORSS USING COUCHAPP 44
64 Deployment
CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server
So that CouchApp knows what server to send the document to we needto amend the couchapprc file
env
default
dbhttpusernamepasswordmickcouchonecomuniversities
So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly
65 Using templates
In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB
For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish
Firstly let us generate a simple View all This consists of a mapjs
file as follows (the corresponding reducejs file has been deleted)
couchapp generate view all
The contents of mapjs are as follows
function(doc)
emit(doc_id doc)
This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall
The next step is to create a List function to transform the contents ofthe View
couchapp generate list georss
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 6 SERVING GEORSS USING COUCHAPP 45
This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data
from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory
The full text is provided in Appendix D but the main part of the file is alsoreproduced below
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted
(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)
The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall
The resulting Google Map may be viewed here httpmapsgoogle
comq=httpmickcouchonecomuniversities_designbc_listgeorss
all
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 6 SERVING GEORSS USING COUCHAPP 46
Figure 66 Activity Mapping data from CouchDB on Google Maps
This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg
The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Section 7
Critical Assessment andConclusion
This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server
The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript
71 Why might a developer choose CouchDB
One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing
Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication
47
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48
of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-
oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed
Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web
A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol
A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)
72 CouchDBrsquos challenges
As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML
The challenges revealed while developing on the CouchDB platform wereas follows
bull writing queries as map-reduce functions instead of as SQL
bull understanding the structure of design documents
bull the use of server-side JavaScript functions to transform documents toHTML and XML
The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49
CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers
A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant
73 Conclusion
The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo
Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere
This projectrsquos focus was on the web development process at more con-ventional scale
The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity
Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency
For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Appendix A
Design Document d1
This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone
com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple
HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map
A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen
A Google Map example httpmickcouchonecomuniversities
_designd1_showmap1University20of20Aberdeen
Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand
As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server
_id _designd1
_rev 5- d6da34f482e0ee1a711ed302b9b08bb1
shows
s1 function(doc req) return
rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +
rsquoltpgtlatitude rsquo + doc
latitude + rsquoltpgtrsquo + rsquoltpgt
longitude rsquo + doclongitude + rsquoltpgtrsquo
map1 function(doc req) return rsquoltDOCTYPE
html gtlthtml gtlthead gtltmeta name = viewport
content = initial -scale =10 user -scalable
50
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
APPENDIX A DESIGN DOCUMENT D1 51
=no gtltstyle type = textcssgt html
height 100 body height 100
margin 0px padding 0px map_canvas
height 100 ltstyle gtltscript type = text
javascript src= http mapsgoogle
commapsapijssensor=falsegtltscript gtlt
script type = textjavascript gt function
initialize () var latlng = new google
mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc
longitude + rsquo) var myOptions =
zoom 14 center latlng
mapTypeId googlemapsMapTypeIdROADMAP
var map = new googlemapsMap(
documentgetElementById ( map_canvas )
myOptions) var marker = new google
mapsMarker ( position latlng title rsquo
+ doc_id + rsquo) var infowindow = new
googlemapsInfoWindow ( content rsquo +
doc_id + rsquo ) googlemapsevent
addListener(marker click function ()
infowindowopen(map marker) ) marker
setMap(map) ltscript gtlthead gtltbody onload
= initialize ()gt ltdiv id= map_canvas
style = width 100 height 100 gt ltdiv
gtltbody gtlthtml gtrsquo
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Appendix B
GeoRSS on Sofa
In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds
The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox
couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_
ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source
code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()
At the end of vendorcouchapplibatomjs
exportsheader = function(data)
var f = ltfeed xmlns=httpwwww3org2005Atomgt
var f = ltfeed xmlns=httpwwww3org2005Atom
xmlnsgeorss=httpwwwgeorssorggeorssgt
ftitle = datatitle
fid = datafeed_id
flinkhref = datafeed_link
flinkrel = self
fgenerator = CouchApp on CouchDB
fupdated = rfc3339(dataupdated)
return ftoXMLString()replace(ltfeedgtrsquorsquo)
exportsentry = function(data)
var entry = ltentrygt
entryid = dataentry_id
entrytitle = datatitle
52
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
APPENDIX B GEORSS ON SOFA 53
entrycontent = datacontent
entrycontenttype = (datacontent_type || rsquohtmlrsquo)
entryupdated = rfc3339(dataupdated)
entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt
entrylinkhref = dataalternate
entrylinkrel = alternate
entrypoint = datapoint
return entry
At the end of listsindexjs
alternate pathabsolute(pathshow(rsquopostrsquo rowid))
point rowvalueloc[1] + + rowvalueloc[0]
point rowvaluelatitude + + rowvaluelongitude
)
send the entry to client
send(feedEntry)
while (row = getRow())
close the loop after all rows are rendered
return ltfeedgt
)
I also made the following rudimentary changes to templatesedithtml
lt-- form to create a post --gt
ltform id=new-post action=newhtml method=postgt
lth1gtpageTitlelth1gt
lt-- amended for geosofa --gt
ltpgtltlabelgtPlace Nameltlabelgt
ltinput type=text size=50 name=title value=gtltpgt
ltpgtltlabelgtLatitudeltlabelgt
ltinput type=text size=50 name=latitude value=gtltpgt
ltpgtltlabelgtLongitudeltlabelgt
ltinput type=text size=50 name=longitude value=gtltpgt
lt-- --gt
this is further down in templatesedithtml
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
APPENDIX B GEORSS ON SOFA 54
apply docForm at login
$(account)evently(
loggedIn function(er)
var userCtx = ruserCtx
postForm = appdocForm(formnew-post
id docid
fields [title body tags]
fields [title latitude longitude body tags]
template
type post
format markdown
author userCtxname
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Appendix C
georssjs
As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents
What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html
function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D
This provides a great deal of templating flexibility
function(head req)
var ddoc = this
var Mustache = require (libmustache )
var List = require ( vendorcouchappliblist)
provides (html function ()
var key =
var stash =
institutions ListwithRows(function(row)
var institution = rowvalue
key = rowkey
return
id institution_id
rev institution_rev
institutionID institution
institutionID
postcode institution
postcode
latitude institutionlatitude
longitude institutionlongitude
constituency institutionconstituency
localAuthority institution
55
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
APPENDIX C GEORSSJS 56
localAuthority
region institutionregion
has_programmes institutionprogrammes
true false
programmes institutionprogrammes
institutionprogrammesmap(
function(programme)
return
name programmename
url programmeurl
has_countries programmecountries
true false
countries programmecountries
programmecountriesmap(function(
country)
return
country
country
) [] return
nothing if no
countries
) [] return nothing if no
programmes
)
return Mustacheto_html(ddoctemplatesgeorss
stash)
)
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Appendix D
georsshtml
The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file
The code below goes hand-in-hand with the georssjs file described inAppendix C
Note in particular the iteration code the section between countries
and countries will execute once for each country listed in a particularprogramme
Here is an example of a typical document which is transformed by thetemplate code
_id University of St Andrews
_rev 1-7a29159d52cd71b9f6759ad6d3884945
institutionID 15161
postcode KY16 9AJ
latitude 563412139443169
longitude -279301175308608
constituency North East Fife
localAuthority Fife Council
region Scotland
programmes [
name Chevening Programme
url httpwwwcheveningcom
countries [
idmztj
]
name Commonwealth Scholarship and Fellowship Plan (CSFP)
url httpwwwcsfp-onlineorg
57
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
APPENDIX D GEORSSHTML 58
countries [
bdza
]
]
Listed below is the code for georsshtml Even though it is a html
file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache
ltfeed xmlns =http wwww3org 2005 Atom xmlns
georss =http wwwgeorssorggeorssgt
lttitle gtBritish Council Activity Mapping lttitle gt
ltid gthttp mickcouchonecomuniversities_design
bc_listgeorssall ltidgt
ltlink href=http mickcouchonecomuniversities
_designbc_listgeorssall rel=selfgt
ltgenerator gtCouchApp on CouchDB ltgenerator gt
ltupdated gt2010 -09 -12 lt updated gt
institutions
ltentry gt
ltid gt id ltidgt
lttitle gt id lttitle gt
ltcontent type=htmlgt
has_programmes
programmes
ampltpampgt name
has_countries
countries
ampltimg src=http mickcouchonecom
countries country flag alt =
country title = country ampgt
countries
has_countries
ampltpampgt
programmes
has_programmes
ltcontent gt
ltpoint gt latitude longitude ltpoint gt
ltentry gt
institutions
ltfeed gt
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Appendix E
Bash Script to UploadCountry Flag Files
This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document
An example of an uploaded flag file is httpmickcouchonecom
countriesdeflag the flag of Germany
binbash
png country flag files from http wwwfamfamfam
comlabiconsflags
png country flag files are copied to countries
png folder
(beneath the current folder)
FILES = countriespng
create countries database in CouchDB
curl -X PUT http usernamepasswordmickcouchone
comcountries
for filepath in $FILES
do
echo $filepath
get the file name from the file path
filename=$(echo $filepath | sed -e rsquos countries
png grsquo)
echo $filename
docname=$(echo $filename | sed -e rsquospnggrsquo)
echo $docname
url=http usernamepasswordmickcouchonecom
countries$docname flag
59
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60
echo $url
put the attachment into CouchDB
this command creates the lsquoiersquo record and puts
the png as an attachment
under countriesieflag
curl -H Content -Typeimagepng
-X PUT http 1270015984 countriesieflag --
data -binary iepng
echo curl -H Content -Typeimagepng -X PUT $url
--data -binary $filepath
curl -H Content -Typeimagepng -X PUT $url --
data -binary $filepath
done
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
Bibliography
[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http
nosqlsummerorg
[2] British Council httpwwwbritishcouncilorg
[3] django The Web framework for perfectionists with deadlines http
wwwdjangoprojectcom
[4] GeoRSS httpwwwgeorssorg
[5] How desktopcouch works httpwwwfreedesktoporg
wikiSpecificationsdesktopcouchDocumentationHow_
Desktopcouch_Works
[6] Introducing JSON httpjsonorg
[7] KML httpcodegooglecomapiskmldocumentation
[8] Quickly httpswikiubuntucomQuickly
[9] Relational Persistence for Java and NET httpwwwhibernate
org
[10] The CouchDB Project httpcouchdbapacheorg
[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost
CouchDB-Implements-a-Fundamental-Algorithm
[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml
[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010
[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007
61
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
BIBLIOGRAPHY 62
[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970
[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009
[17] CouchOne BBC A Case Study httpwwwcouchonecom
case-study-bbc
[18] CouchOne CERN A Case Study httpwwwcouchonecom
case-study-cern
[19] CouchOne Why A Mobile Database httpwwwcouchonecom
pagewhy-mobile
[20] British Council Annual Report 2009-10 httpwww
britishcouncilorgnewGlobalBC20Annual20Report
202009-10_reuploadpdf
[21] British Council Erasmus httpwwwbritishcouncilorg
erasmushtm
[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004
[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004
[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa
eduau~lpbpapersauug99-erlhtml
[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997
[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on
[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002
[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww
cybercitibizfaqbash-loop-over-file
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
BIBLIOGRAPHY 63
[29] github CouchApp Commit History httpgithubcomcouchapp
couchappcommitsmaster
[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml
[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting
of-the-web
[32] Damien Katz CouchDB and Me httpwwwinfoqcom
presentationskatz-couchdb-and-me
[33] Damien Katz CouchDB Architecture httpdamienkatznet
200504couchdb5Farchitehtml
[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706
firefox-bookmarks-in-couchdb
[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs
[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog
nosql-first-impressions-object-databases-missed-the-boat
[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College
[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB
[39] Microsoft Entity Framework Design httpblogsmsdncomb
efdesign
[40] Ted Nedward The Vietnam of Computer Science http
blogstednewardcom20060626The+Vietnam+Of+Computer+
Scienceaspx
[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml
[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom
using-jquery-and-couchdb-to-build-a-simple-we
[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912
code-tutorial-make-your-application-sync-with-ubuntu-one
ars
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
BIBLIOGRAPHY 64
[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml
[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005
[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)
[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani
couchonecomblog_designsofa_listpostpost-page
startkey=[22CouchDB-1-0-Retrospectives22]
[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub
MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder
5FSandenpdf
[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat
[50] Wikipedia ACID httpenwikipediaorgwikiACID
[51] Wikipedia cURL httpenwikipediaorgwikiCURL
[52] Wikipedia Database Normalization httpenwikipediaorg
wikiOpen5FDatabase5FConnectivity
[53] Wikipedia JSON httpenwikipediaorgwikiJSON
[54] Wikipedia Relational model httpenwikipediaorgwiki
Relational5Fmodel
[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews
CouchDB-Damien-Katz
- Introduction
-
- Motivation
- Does `one size still `fit all
- Organisation
-
- Background
-
- The Relational Model
- Relational Database Management Systems
- Normalisation
- SQL
- Transactional Guarantees
- The `NoSql Movement
- `NoSql databases at large scale
- Brewers `CAP Theorem
- Reducing the Impedance Mismatch
- Benefits of `NoSql
- Ad-hoc Querying - the `Achilles Heel of `NoSql
- The Activity Mapping Project
- Raising Awareness of British Council Impact in the UK
- Using CouchDB to Serve Mapping Data
-
- Introduction to CouchDB
-
- `Of the Web
- Some History
- Document-Oriented
- Erlang
- How Ubuntu uses CouchDB
- Desktop Couch Python and Quickly
-
- Using CouchDB
-
- Motivation
- Hosted CouchDB Service Providers
- Installing CouchDB
- Inserting a Document into a CouchDB Database
- Deleting a Document
- Updating a Document
- Adding Attachments
- Replication
- Querying a CouchDB Database using Map-Reduce
-
- Serving HTML from CouchDB
-
- Bulk upload of JSON documents
- A CouchDB Design Document
- A more complex `show function
- Views and Lists
- Lessons Learned
-
- Serving GeoRSS using CouchApp
-
- Introduction to CouchApp
- Sofa - a Blogging Application
- Developing with CouchApp
- Deployment
- Using templates
-
- Critical Assessment and Conclusion
-
- Why might a developer choose CouchDB
- CouchDBs challenges
- Conclusion
-
- Design Document d1
- GeoRSS on Sofa
- georssjs
- georsshtml
- Bash Script to Upload Country Flag Files
-
top related