integrating version control into owncloud · the kde owncloud project is an open-source cloud...

78
INTEGRATING VERSION CONTROL INTO OWNCLOUD Main Project Report - CS39440 student : Craig Roberts supervisor : Richard Shipman A dissertation submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Open Source Computing Department of Computer Science Version 1.0 26 th April 2012 (Release) 15,726 words

Upload: others

Post on 16-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

I N T E G R AT I N G V E R S I O N C O N T R O L I N T O O W N C L O U D

Main Project Report - CS39440

student : Craig Roberts

supervisor: Richard Shipman

A dissertation submitted in partial fulfilment of the requirements for the degreeof Bachelor of Science in Open Source Computing

Department of Computer ScienceVersion 1.0 26

th April 2012 (Release)

15,726 words

Page 2: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

D E C L A R AT I O N O F O R I G I N A L I T Y

In signing below, I confirm that:

This submission is my own work, except where clearly indi-cated.

I understand that there are severe penalties for plagiarism andother unfair practice, which can lead to loss of marks or even thewithholding of a degree.

I have read the sections on unfair practice in the Students’Examinations Handbook and the relevant sections of the currentStudent Handbook of the Department of Computer Science.

I understand and agree to abide by the University’s regulationsgoverning these issues.

Date / / Signature ______________________

Craig Roberts

Page 3: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

A C K N O W L E D G E M E N T S

None of this would have been possible without my supervisor,Mr. Richard Shipman, for his help and support during the timeI have spent working on this project—his feedback, suggestionsand criticisms were invaluable.

Also, without Mr. Frank Karlitschek of ownCloud, this projectwould never have achieved the progress it has made to date, andI must especially thank him for accepting the final version intoownCloud for improvement by the community.

I am particularly grateful to my friends, James, Alex, Tom,and Louise for providing me with combinations of shelter, com-panionship and unforgettable experiences during my time inAberystwyth.

Above all, I must thank my parents, though words fall short,for their unending patience, unwavering support and uncompro-mising faith in me.

—Craig Roberts

iii

Page 4: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

A B S T R A C T

The KDE ownCloud project is an open-source cloud storageproject written exclusively in PHP. The ownCloud software sup-ports third-party extensions through the use of ’apps’, with avariety of apps supporting sharing, syncing and more includedas standard.

Git is a decentralised version control system, with a uniquelysimple, yet robust repository format. This dissertation discussesthe design and implementation of a Git-based versioning imple-mentation into ownCloud. The result is a prototype implementa-tion that provides version history to a top-level ’Backup’ folder,complete with roll-back and file viewing support. The versioningimplementation is independent of the ownCloud modification,suitable for use in future or unrelated projects.

iv

Page 5: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

C O N T E N T S

1 Introduction 1

1.1 Document Outline . . . . . . . . . . . . . . . . . . . 2

2 Version Control Software 4

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Delta Compression . . . . . . . . . . . . . . . 5

2.1.2 Snapshots . . . . . . . . . . . . . . . . . . . . 6

2.1.3 Weaves . . . . . . . . . . . . . . . . . . . . . . 6

2.1.4 Terminology . . . . . . . . . . . . . . . . . . . 6

2.2 A Brief Review . . . . . . . . . . . . . . . . . . . . . 7

2.3 Bazaar . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Mercurial . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5 Version Control Summary . . . . . . . . . . . . . . . 12

3 The ownCloud Project 14

3.1 Request Flow . . . . . . . . . . . . . . . . . . . . . . 14

3.2 OC_Filesystem . . . . . . . . . . . . . . . . . . . . 15

3.3 OC_Filestorage . . . . . . . . . . . . . . . . . . . 16

3.4 OC_FileCache . . . . . . . . . . . . . . . . . . . . . 17

3.5 Extending the Filesystem . . . . . . . . . . . . . . . 17

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Git 19

4.1 Loose Files . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Packfiles and Packfile Indexes . . . . . . . . . . . . . 20

4.3 Git References . . . . . . . . . . . . . . . . . . . . . . 23

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Requirements and Development Process 25

5.1 Functional Requirements . . . . . . . . . . . . . . . . 25

5.2 Technical Challenges . . . . . . . . . . . . . . . . . . 27

5.3 Development Process . . . . . . . . . . . . . . . . . . 29

5.3.1 Test-Driven Development . . . . . . . . . . . 29

5.3.2 Scrum and Evolutionary Prototyping . . . . 30

5.3.3 Open-Source Development . . . . . . . . . . 31

6 Design 33

6.1 Granite Porcelain . . . . . . . . . . . . . . . . . . . . 33

6.2 Granite Plumbing . . . . . . . . . . . . . . . . . . . . 37

6.3 OC_Filestorage_Versioned . . . . . . . . . . . 39

6.4 OC_VersionStreamwrapper . . . . . . . . . . . . 40

6.5 The ownCloud Admin Panel . . . . . . . . . . . . . 42

7 Implementation 43

7.1 Granite Porcelain . . . . . . . . . . . . . . . . . . . . 43

7.2 Granite Plumbing . . . . . . . . . . . . . . . . . . . . 46

7.3 The OC_Filestorage_Versioned class . . . . . 48

7.4 The OC_VersionStreamwrapper class . . . . . . 50

v

Page 6: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

contents vi

7.5 The ownCloud Admin Panel . . . . . . . . . . . . . 52

8 Testing 54

8.1 Granite Testing . . . . . . . . . . . . . . . . . . . . . 54

8.2 ownCloud Testing . . . . . . . . . . . . . . . . . . . . 56

9 Summary 58

10 Future Work 60

11 Conclusion 62

11.1 Planning and Development . . . . . . . . . . . . . . 62

11.2 Progress and Implementation . . . . . . . . . . . . . 62

11.3 Final Words . . . . . . . . . . . . . . . . . . . . . . . 63

a OC_Filestorage Listing 65

b PHP StreamWrapper Class 66

annotated bibliography 67

Page 7: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

L I S T O F F I G U R E S

Figure 3.1 Sequence diagram of the ownCloud file server15

Figure 3.2 Partial class diagram of the ownCloud filesys-tem . . . . . . . . . . . . . . . . . . . . . . . . 16

Figure 3.3 The structure of the ownCloud Sharing app 18

Figure 4.1 Git packfile indexes, from the Git Commu-nity Book (GPL) [9] . . . . . . . . . . . . . . . 21

Figure 4.2 Git packfile format, from the Git Commu-nity Book (GPL) [9] . . . . . . . . . . . . . . . 22

Figure 5.1 Scrum-style task sheet for the ownCloudimplementation . . . . . . . . . . . . . . . . . 30

Figure 5.2 Gantt chart displaying predicted and actualdevelopment timescales . . . . . . . . . . . . 31

Figure 6.1 Class diagram of Granite . . . . . . . . . . . 34

Figure 6.2 Object diagram of Granite for a simple repos-itory . . . . . . . . . . . . . . . . . . . . . . . 37

Figure 6.3 Class diagram of the Granite plumbing’ . . . 38

Figure 7.1 File layout for the files_versioning own-Cloud app . . . . . . . . . . . . . . . . . . . . 44

Figure 7.2 OC_Filestorage_Versioned class diagram48

Figure 7.3 OC_VersionStreamwrapper class diagram 51

Figure 7.4 ownCloud Admin Panel – Versioning andBackup . . . . . . . . . . . . . . . . . . . . . . 52

Figure 8.1 Code coverage of Granite’s release tags . . . 54

Figure 8.2 Final code coverage statistics for Granite . . 55

C O D E L I S T I N G S

Listing 3.1 A code snippet from the OC_Filestorage_Localclass in ownCloud . . . . . . . . . . . . . . . 17

Listing 4.1 Output of git show --pretty=RAW HEAD 20

Listing 7.1 Initial commit implementation . . . . . . . . 49

Listing 7.2 Example use of OC_VersionStreamwrapperby OC_Filestorage_Versioned . . . . . 53

vii

Page 8: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

Listing 8.1 Todo list for ownCloud versioning app, bySam Tuke . . . . . . . . . . . . . . . . . . . . 56

Listing A.1 ownCloud OC_Filetorage abstract class . 65

Listing B.1 PHP StreamWrapper class prototype fromthe PHP manual . . . . . . . . . . . . . . . . 66

A C R O N Y M S

WebDAV Web Distributed Authoring and Versioning

VCS Version Control System

RCS Revision Control System

SCCS Source Code Control System

SVN Apache Subversion

CVS Concurrent Versions System

HTTP HyperText Transfer Protocol

CalDAV Calendaring Extensions to Web Distributed Authoringand Versioning (WebDAV)

CardDAV vCard Extensions to WebDAV

LDAP Lightweight Directory Access Protocol

TDD Test-Driven Development

viii

Page 9: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

1I N T R O D U C T I O N

The ownCloud project develops cloud storage software, writtenprimarily in PHP and JavaScript, with client support on a varietyof devices. It is developed under the KDE umbrella, led by FrankKarlitschek, and its recent success has led to the formation ofownCloud, Inc. for commercial licensing and support services.The software provides a number of useful and advantageousfeatures unrivalled by other open-source software, including:

• File storage, accessible through the web interface and viaWebDAV. WebDAV is supported by Microsoft Windows®, OSX® and Linux.

• Uses the Calendaring Extensions to WebDAV (CalDAV) toprovide calendar syncing

• Uses the vCard Extensions to WebDAV (CardDAV) to providecontact syncing

• A music streaming server using the Ampache protocol

• User and group administration, with both OpenID andLightweight Directory Access Protocol (LDAP) integrationavailable

• File and folder sharing across users, either privately orpublicly

• Third-party extensions and modifications through the useof ownCloud “apps”

The ownCloud community edition has over 400,000 users, andthe development roadmap contains a number of exciting features,such as encryption, collaborative document editing, and versioncontrol, amongst other things. As an open-source computingstudent, I proposed to develop the beginnings of a version controlsystem, with the project submission coinciding with the releaseof ownCloud 4. Following discussion on the ownCloud mailinglist, this proposal was accepted.

In this dissertation I investigate how Git repositories can beused to implement a simple backup system in ownCloud, thendesign and implement such a system using a custom PHP Git

1

Page 10: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

1.1 document outline 2

library and an ownCloud app. Finally, I submit the system forinclusion within ownCloud itself.

Implementing a Version Control System (VCS) in ownCloudprovides a number of advantages and possibilities for futuredirection. Briefly, the benefits of this implementation include:

• Automatic backup of files, with a commit-on-write model

• Universal versioning through any of ownCloud’s clients

• User benefit in the form of recovery features for old fileversions

• A repository format compatible with the official Git client

• No additional dependencies, maintaining ownCloud’s low-dependency footprint

• Long-term possibilities include synchronisation and desktop-client integration

1.1 document outline

This dissertation is structured as follows:

• Chapter 2 - A short description of the most significantversion control systems, with attention to their suitabilityto ownCloud.

• Chapter 3 - A brief overview of the ownCloud software,particularly concerning the filesystem and file caching lay-ers.

• Chapter 4 - A description of Git and its underlying reposi-tory format, with considerations for integration issues forownCloud.

• Chapter 5 - An analysis of the functional requirementsand development methodology selected, discussing influ-ences and considerations found useful when developing amethodology, as well as a short overview of the challengesencountered.

• Chapter 6 - Provides a high-level overview of both Graniteand the ownCloud app.

• Chapter 7 - Describes how the ownCloud app is imple-mented, and describes the implementation details for morecomplex cases.

• Chapter 8 - Details the test results and conclusions fromUnit Testing the Granite library, with a brief discussion ontesting possibilities for the ownCloud app.

• Chapter 9 - Summarizes achievements and requirementsmet over the course of the project.

Page 11: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

1.1 document outline 3

• Chapter 10 - Discusses work still to be implemented, thenext steps to be taken after the project deadline and ideasand recommendations for these next steps.

• Chapter 11 - A conclusion and critical evaluation discussingwhat went wrong and how to improve.

Page 12: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

2V E R S I O N C O N T R O L S O F T WA R E

2.1 introduction

The simplest form of version control is of course well-known:multiple timestamped copies of the original file. The disadvan-tages of this method are also well-known: the manual process oforganising the backups is time-consuming and error-prone, andthe storage mechanism is particularly inefficient.

The first system described as a ’version control system’, de-signed to solve this problem, was the Source Code ControlSystem (SCCS), developed in 1972 by Marc J. Rochkind. The deltachaining method, described in his seminal 1975 paper [20], isstill an accurate description of the storage techniques of modernversion control software. It was later considered to be “researchthat had immediate and long-lasting impact” [10]. SCCS operatedon single files, keeping a corresponding file which stored versionhistory in a weave format. This is covered in more detail in 2.1.1.

The next contender, the Revision Control System (RCS), waswritten in 1985 by Walter F. Tichy. RCS also operated on singlefile, but used a backward-chaining delta algorithm instead of theforward-chaining method used by SCCS. RCS instead stores thelatest version of a file in full and constructs backward deltas forthe file changes. This makes RCS significantly quicker at retrievingthe latest version of a file RCS is notable as Concurrent VersionsSystem (CVS), the first VCS to handle project-wide sets of files,began as a set of scripts for the RCS file format.

RCS largely predated the advent of computer networking, whileCVS was designed specifically with a client-server architecture[19]. CVS was originally a set of scripts designed to bring project-wide awareness to RCS, a concept it called the repository of file[11]. Each developer was given their own copy of the repository,with which they can check out file and commit newly-modifiedversions to the server. The original CVS paper by Dick Grune [11]uses recognisable terminology also used throughout the rest ofthis dissertation, and in most version control documentation andliterature.

4

Page 13: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

2.1 introduction 5

CVS was largely dominant for the next decade of software devel-opment, until what became the Apache Subversion (SVN) projectbegan in the early 2000s. Subversion was largely compatible withCVS commands [12], but almost completely different in its inter-nal operation. SVN versions an entire repository, creating a newrepository ’snapshot’ whenever a file is changed. SVN extendsthe merging functionality in CVS to the repository level, ratherthan file level. Commits to the repository are also atomic, with atransaction beginning before the commit, and ending on success.These transactions ensure a consistent state for the repository atall times.

Finally, what Eric S. Raymond terms third-generation versioncontrol systems appeared in the forms of BitKeeper and GNUArch. BitKeeper became the VCS associated with the Linux Kernel,while GNU Arch pioneered the concepts of merging and change-sets in the open-source community. BitKeeper eventually fell outof favour in the open-source community 1, and GNU Arch had areputation of being difficult to use [19].

Following the BitKeeper fiasco, Linus Torvalds began work onGit, while Matt Mackall began Mercurial. Decentralised systemssuch as Git and Mercurial do not have an authoritative repository- there is no ’official’ version of the source code. Every developerkeeps a personal copy of the repository on his or her machineworking asynchronously to other developers. Instead of checkingout code and committing it to the main repository, users pull andpush their changes, respectively, to other developer repositories.

2.1.1 Delta Compression

Delta compression is concerned with storing the difference be-tween two objects, producing a delta. This delta is applied to abase object to generate the new object, resulting in much moreefficient storage.

Traditional delta algorithms, which can be traced to Rochkind’sSCCS paper, utilise line based algorithms. The operations aremodeled as insertions and deletions. While this does not providechanges down to the character, Rochkind found it suitable enoughwith empirical testing to reconstruct all versions of a file [20].

To retrieve a file a simplified view of the operation is to simplystart at beginning of the file reading the delta blocks sequentially,applying them to the base object held in memory. Revision in-formation (such as version, file modification time, etc.) is stored

1 Following allegations of reverse-engineering, the BitKeeper community editionwas withdrawn in 2005.

Page 14: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

2.1 introduction 6

in the header of each delta block. The operation stops once thedesired version is reached.

2.1.2 Snapshots

In snapshot-based systems, each version of a file is saved indi-vidually, independently of other versions. This has a few conse-quences: the repository size quickly grows very large, so com-pression is typically also used to reduce this. This storage disad-vantage is traded for a single disk access for any version (sincethey are all stored individually), making access to versions veryquick. Git and Bazaar both use snapshot-based systems, withsome modifications.

2.1.3 Weaves

All version information is stored in a single file in interleavedblocks. Metadata is added to each interleaved block, indicatingthe version it belongs to along with some other information. Anyversion of a file can be reconstructed with a single sequentialread, although this takes longer the more interleaved blocksthere are (i. e.the more history a file has). The trade-off is havingto rewrite the file each time a new version is added. Bazaarused to utilise this method (although it is unclear what it usesnow) and the original Source Code Control System had a similarmechanism. Mercurial uses a similar mechanism with its Revlogstorage format, discussed in Section 2.4.

2.1.4 Terminology

This document generally uses centralised terminology such ascheck in, check out, commit, etc. The following description of termsis drawn from a variety of sources, largely an analysis of VCSs byDaniel Knittl-Frank in [13]:

• Commit A commit (also revision, or version) is a snapshotof a file at a particular point in time. Different systemshave different terms, but they all refer to essentially thesame thing. To commit a new or changed file means to savesomething into a repository.

• Working Directory The local copy of the repository, whereactual modifications and changes happen before they arecommitted.

Page 15: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

2.2 a brief review 7

• Checkout To load a particular version of a project or file intothe working directory. In SVN this term also describes theworking directory (e.g. your checked out copy).

• Repository The database and underlying storage where theversion history is stored.

• Clone To obtain a complete copy of a repository. Clones arealso implicitly directory branches in distributed systems.

• Merge To combine the changes made in two separate branchesinto one of the branches - this often results in the creationof a merge commit to represent the merge.

• Diff The difference between two files or directories in aproject. Also the name of a UNIX utility for producing thisoutput.

• Patch A file describing the difference between two files—essentially the encoded form of a diff.

• Parent Each commit has a parent commit, which traces backto the initial commit. Merges have two or more parents: therelevant commits of each branch.

• Tip - Also known as the HEAD when discussing the masteror trunk branch, the tip is the latest version in a branch.

2.2 a brief review

Storage mechanisms for modern systems vary, with varyingamounts of documentation. Git’s storage and object models arevery well described around the web and Mercurial has excellentdocumentation on the same subject. Bazaar is more difficult todescribed partly because it takes a different approach to the othersystems and partly due to a lack of documentation.

The following sections review Bazaar and Mercurial for suit-ability in ownCloud. Git was also considered in the same manner,and is described in detail in Chapter 4. Justification for the choiceof Git is given in section 2.5.

2.3 bazaar

Bazaar is a free distributed version control system licensed underthe GPL and part of the GNU project [18]. The main focus is asimple and easy to use interface, with commands largely similarto CVS. Implemented in Python, Bazaar is cross-platform software.

The following description has mostly been drawn from theofficial Bazaar documentation [1] [5].

Page 16: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

2.3 bazaar 8

Bazaar is designed to be used with a variety of workflows,with the general concepts being either centralised, distributed, orgatekeeper. A centralised workflow operates in a client-servermanner, also known as lock-step development [4]. The distributedor gatekeeper are often seen in open-source projects, either withtruly distributed development, or with an official repository inwhich only the gatekeeper has commit access (e.g. Linux).

Bazaar is unique amongst the systems considered for its ap-proach to the centralised workflow, involving a special conceptcalled bound branches. On a bound branch, Bazaar performs twochecks before permitting the user to commit:

• Check that the entire repository tree is up to date

• Performs the commit on the central server, before makingthe commit locally

By checking the entire repository is up to date with the centralcopy, Bazaar is actually stricter than SVN [8], which checks onlythat changed files being checked in are up to date.

The commit to the central server before committing locallypermits "true lockstep development", according to the Bazaardocumentation. With distributed systems, if a team of develop-ers are all developing from the same branch, each commit tothe repository must be followed by a pull command, by everyother developer. Not pulling changes while making commits veryquickly leads to conflicts.

Bazaar solves this with bound branches: the repository is up-dated before the commit, alerting the user to any conflicts thatmay arise Secondly, by performing the commit on the centralserver first Bazaar ensures that any other developers pullingchanges won’t fall "out-of-step": pulling changes shortly beforeanother developer pushes new commits.

In a decentralised workflow, Bazaar operates in much the samemanner as CVS and SVN: developers commit locally and use bzrpush to push their changes, or bzr update to pull new updates.The main distributed workflows, according to the documentation[4] are shared mainline, human gatekeeper, and automatic gatekeeper.With the shared mainline method, every developer has read andwrite access to the main branch. With a human gatekeeper, thegatekeeper reviews each commit before merging it into the mainbranch, while an automatic gatekeeper runs automated test suitebefore merging the code.

Bazaar offers any combination of these workflows, as well ascustom modifications to them. The workflow concerns are largelyseparate to the data storage of Bazaar, which is covered in moredetail below.

Page 17: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

2.3 bazaar 9

Storage Model

• Transports: Transports deal with access to local and remotedirectories. They essentially abstract the file I/O operationsfor a repository, allowing access over a variety of protocolsincluding (S)FTP, SSH and HTTP (read-only) [3].

• Branches: Branches are represented as first-class objects [1].Branches store the current revision ID for the branch (thetip, or HEAD) and references a repository.

• Repository: Repositories contain four "stores", or collec-tions of data, which consist of key-value mappings: a revi-sion store, an inventory store, a text store and a signaturestore. These stores may be stored persistently on disk inany way, since Bazaar is more of an API to a repositoryrather than a strict specification [1].

The revision store stores revision objects. A revision describesa snapshot of a tree of file, along with some metadata. Arevision also points to an inventory store.

The inventory store contains inventory objects, which arethemselves a set of inventory entries. An inventory entry hasa set of metadata associated with it: a fileID, a revision ID,a kind (file, directory, symlink or tree-reference), a parentID, name and whether the executable bit is set. Dependingon the value of kind, filesize, directory children, symlinktargets or a tree reference ID may be added as well.

The texts store simply stores the contents of individual file,referenced with a key in the format of: (fileid, revision-id).All kinds have an entry in the texts store, but generally onlyfile have actual contents.

The signature store is simply a store of cryptographic signa-tures with identical keys to the revision store.

Object Model

• Revision: A revision is a snapshot of a tree. The tree con-sists of file and directories, including the content of thosefiles. A revision also includes associated metadata, suchas the committer, a timestamp, a short commit messageand a reference to the parent commit. Bazaar also supportsthird-party applications storing additional information withrevision properties [2].

• Working Tree: Also known as the working directory, theworking tree is the version-controlled directory containingthe repository and the file. A working tree is associated

Page 18: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

2.4 mercurial 10

with a branch. As described in section 2.1.4, the user makeschanges to file in the working tree and then commits them.

• Branch: A branch is an ordered series of revisions, withthe latest revision called the tip. Branches can be split andmerged back together again, forming a directed acyclicgraph (DAG) of revisions.

• Repository: A repository is a collection of revisions, eachbranch is considered to have its own repository; they aresometimes shared to optimise disk usage when it makessense to do so.

Bazaar Summary

Bazaar is more of an API than a concrete format, and the storagedecisions are largely up to the implementation. It offers the mostsupport for a variety of workflows, in contrast to the simpler Gitand Mercurial. Due to the time constraints for this project, Bazaaris not considered suitable, with a clear investment necessary tounderstand the underlying structures and approaches.

2.4 mercurial

Like Bazaar, Mercurial is written primarily in Python, providingexcellent portability between operating systems (including Mi-crosoft Windows®). However, Mercurial shares more in commonwith Git than Bazaar, as discussed below.

Git uses a combination of objects stored in the loose files, andmultiple objects compressed in packfile. Mercurial, in contrast,uses a Revlog structure, essentially a series of delta chunks withoccasional snapshots.

One of the main points is that Mercurial places strict controls onproject history—the Revlog structure is append-only, renderingproject history immutable. Git packfile may use deltas with anyother object as their base, while Revlogs use the previous deltaas the base, with occasional full snapshots.

Object Model

• Nodeid: A unique ID that represents the contents of a fileand its position in the project history.

Nodeids are computed using the SHA-1 algorithm, basedon the file contents at the time and the existing file history.

Page 19: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

2.4 mercurial 11

The nodeids of each parent in the history is used to createthe SHA-1 hash. From the Mercurial nodeid page:

If you modify a file commit the change, and thenmodify it to restore the original contents, the con-tents are the same but the history is different, sothe file will get a new nodeid [17].

This also ensures an immutable history, because even revert-ing a file results in a new commit, which itself forms partof the history of the next commit. Git achieves this as well,with the SHA hashing of the file content and the commitdata, but Git provides built-in support for rebuilding thehistory with git rebase.

Then there is the concept of the nullid, which is a nodeidthat consists entirely of zeros. This serves as the empty rootrevision.

• Revlog: A Revlog stores all the previous and current ver-sions of a file, in a single structure consisting of two parts:an index file and a data file [14].

The index file contains a series of bookmarks into the datafile, identified by a revision number. These revision numbersare incremental, providing a simple method of checkingthe version of a file in a similar manner to SVN. As aside-note, revision numbers are local only and may differ incloned repositories, as opposed to Git where SHA-1 ids areuniversal identifiers.

The index file bookmarks can be used to seek to a positionin the data file, which stores a series of compressed hunks,one per revision. These hunks are either snapshots of thefile contents, or delta compressed using the previous hunkas a base. Delta hunks store a pointer to the location of thefirst snapshot, allowing clients to begin at the first snapshotand continuously apply the deltas that follow. This providesO(1) for locating and retrieving any given revision.

Since Revlog files are appended, rather than rewritten, writ-ing is also a O(1) operation. Since the index entries arewritten after the data hunks, there is no requirement to lockthe file from reading while writes are taking place, reducinglock contention.

• Changeset: A changeset stores a collection of changed filesin a repository, which is uniquely identified by a change-setID. Creating a changeset involves committing or checkingin to the repository [16]. This is used along with the mani-fest to record the difference between one commit and thenext.

Page 20: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

2.5 version control summary 12

• Manifest: The manifest stores the state of the repositoryat a particular changesetID. This involves listing each filealong with its nodeid. Manifests provide the conceptualequivalent of a directory tree, stored in a Revlog with anassociated index. The history of this manifest provides therepository history, showing changes over time.

Fetching the entire repository for a particular revision isnow simple: retrieve the relevant changeset, retrieve theassociated manifest (which provides a list of files and re-visions) and finally walk the manifest, retrieving each fileversion.

• Branches: Branches generally refer to two separate conceptsin Mercurial [15]. The first concept is typically referredto as a branch in other distributed systems: it indicatesa diverged line of development, branching from normaldevelopment. These branches in Mercurial are manifest asa list of consecutive changesets, providing multiple lines ofdevelopment within a single repository.

The second concept is more similar to tags: branches canhave a name, which is stored along with the changeset.By default, the branch name is set to default, but this canbe changed for any purpose the user may require such asrelease tagging.

Mercurial Summary

Mercurial has a simple and robust repository format, althoughRevlogs use a delta encoding format roughly based on Python’sdifflib 2. There do not appear to be any re-implementations ofthis in PHP, or any suggestions for reading Mercurial repositorieswith PHP.

2.5 version control summary

Bazaar is the most complex, robust and flexible of all the ver-sion control systems considered. The addition of bound branchesallows for some interesting workflow styles that may be problem-atic with other distributed systems, with the underlying conceptsof branches, repositories and trees are broadly similar to its com-petitors.

The most interesting aspect of Bazaar is an almost completelack of implementation detail: provided the client API behavesas expected and returns appropriate data, the layout of Bazaar

2 From the bdiff.c comments in the Mercurial source code

Page 21: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

2.5 version control summary 13

repository data on the disk is almost completely irrelevant. Thebzrlib Python library API documentation shows that the Trans-port implementations handle actually saving a file to disk, but itis unclear how they are encoded/decoded.

Due to the lack of concrete file formats, and the variety ofsupported workflows, Bazaar would appear to be too complexconsidering the length of this project.

Mercurial is broadly similar to Git. Functionally, the two arealmost identical, despite having different design goals [21]. Themajor difference lies in how they store their data and the reason-ing behind it: Mercurial’s focus on high scalability resulted in theRevlog format, with O(1) insertion and retrieval. Git’s emphasison simplicity and speed resulted in a repository structure fromjust four basic objects (with tags not strictly necessary).

Git and Mercurial share a number of features; both track revi-sions using SHA-1 ids, and the Git packfile format is similar innature to Mercurial Revlogs, albeit with some important differ-ences. They are generally simpler in nature than their competitors,reflecting the saying ’smart data structures, dumb code’.

There is little reason for one implementation over the other,in terms of performance or documentation. The choice of Git issimply due to my familiarity with it, and the general availabilityof reference implementations. The repository format is extremelysimple, and the PHP zlib module should provide read/writesupport for the compression used. Due the similarity between theMercurial and Git, a few modifications could add dual support toGranite but this is out of the scope of this project. Git is exploredin more detail in Chapter 4.

Page 22: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

3T H E O W N C L O U D P R O J E C T

The ownCloud project contains PHP-only implementations of sev-eral standardised protocols. The application uses a Model-View-Controller (MVC) architecture along with a series of application’hooks’, which can be used to provide further functionality. Thisanalysis is largely concerned with the filesystem; very little ofthe template code, or other sections of the ownCloud code-base,affect this area of operation.

The relevant classes involved with the filesystem include:

• OC_Files - Handles file server access for the web andWebDAV clients.

• OC_FileCache - Provides filesystem metadata caching.

• OC_Filesystem - Provides a high-level abstraction for allfilesystem operations in ownCloud, including WebDAV andthe web interface.

• OC_Filestorage - Provides an abstraction for the under-lying filesystem calls, including those added by ownCloudapps.

3.1 request flow

The initial request is made to the file/index.php templatefile, which uses the OC_File class to fetch directory information.Within the OC_Files class, each folder is checked against thelast cache entry, to see if it has been modified. The cached foldersare fetched with OC_FileCache::getFolderContent().

If a directory has been modified, the cache is updated withOC_FileCache::updateFolder(), which fetches an instanceof OC_FilesystemView representing this directory. The OC_FilesystemViewclass returns a directory handle, which is used to iterate over thedirectory and update the cache.

An example is shown in Figure 3.1; the list of files and direc-tories returned by OC_Files::getDirectoryContent() isused to build the filesystem display in the web interface. If the

14

Page 23: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

3.2 OC_FILESYSTEM 15

Figure 3.1: Sequence diagram of the ownCloud file server

call to OC_FileCache::isUpdated() fails, the displayed fileand directories will not match the true content of the filesystem.

3.2 OC_FILESYSTEM

The dependencies between the various classes can be seen in Fig-ure 3.2, with the inclusion of the final OC_Filestorage_Versionedand OC_VersionStreamwrapper classes for reference.

The filesystem class in ownCloud is the main interface tothe storage for ownCloud code, including third-party apps andWebDAV access. There is a corresponding OC_FilesystemViewclass, which passes calls to the storage layer responsible for eachmounted directory. As the class comments from lib/filesystem.phpstate:

“This class won’t call any filesystem functions for it-self but but will pass them to the correct OC_Filestorage

Page 24: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

3.3 OC_FILESTORAGE 16

Figure 3.2: Partial class diagram of the ownCloud filesystem

object. This class should also handle all the file per-mission related stuff”

— Frank Karlitschek

Mounting a directory is simply achieved using OC_Filesystem::mount(),requiring a class name, an array of arguments, and the mountlocation. At this point most of the functionality continues in theOC_Filestorage class.

3.3 OC_FILESTORAGE

The OC_Filestorage class was the major focus of developmentfor the ownCloud integration. The root class is simply an abstractinterface, defining methods analogous to the native PHP filesys-tem calls. These methods return resource handles. These resourcehandles are then passed to ordinary filesystem functions in PHP,providing native read/write support with PHP streams, whilestill abstracting the storage provider.

A concrete implementation is provide with OC_Filestorage_Local,which provides an implementation for local file access. Listing 3.1displays a code snippet demonstrating the use of native PHPfunctions to return resource handles, as well as more ordinarymethods such as is_dir(). The @ symbol is used here to sup-press error messages.

However, this implementation does not demonstrate the recom-mended approach for more complex functionality, or situationswhere returning a resource handle is not as simple. As Granitedoes not operate on local file in the ordinary sense, a secondaryimplementation is required that can return resource handles asexpected by the rest of PHP and ownCloud. The solution, aPHP stream wrapper implementation, is described further inChapter 7.

Page 25: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

3.4 OC_FILECACHE 17

Listing 3.1: A code snippet from the OC_Filestorage_Local class inownCloud

<?php

class OC_Filestorage_Local {...public function mkdir($path){return @mkdir($this->datadir.$path);

}public function rmdir($path){return @rmdir($this->datadir.$path);

}public function opendir($path){return opendir($this->datadir.$path);

}public function is_dir($path){if(substr($path,-1)==’/’){

$path=substr($path,0,-1);}return is_dir($this->datadir.$path);

}...}

3.4 OC_FILECACHE

The OC_FileCache class was introduced with ownCloud 3,providing a database table with cached filesystem informationsuch as filename, mimetype, etc. and a set of boolean flags forencrypted, versioned and writable folders.

The file cache is the main provider of filesystem information tothe rest of ownCloud. As only the metadata is cached, read andwrite operations still pass through the OC_Filesystem layer.Every time the file cache is queried, it checks if the cache is out ofdate. If necessary, it updates the cached information, then returnsthe information from the database.

3.5 extending the filesystem

The architecture of ownCloud provides for 3rd-party “apps”,which can include their own classes and asset file, then useownCloud hooks to integrate with the rest of the system. A goodexample is the file sharing app by Michael Gapczynski.

Figure 3.3 shows the layout of the sharing app, with the appconfiguration file appinfo/app.php and appinfo/info.xml.The XML file details the app name, author, license, version num-ber, and required version of ownCloud. The app/app.php fileperforms the loading of necessary classes and attaches them tothe hooks provided by ownCloud.

Page 26: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

3.6 summary 18

Figure 3.3: The structure of the ownCloud Sharing app

The versioning application will use the appinfo/app.phpfile to load the Granite library, stored separately in the 3rdpartyfolder, the version storage implementation, and the stream wrap-per implementation.

The application can then mount the included file storage classto a directory in the ownCloud filesystem. This is accessible viaWebDAV and through the web interface, providing transparentversioning support in ownCloud.

3.6 summary

Extending ownCloud has clearly been a significant focus of devel-opment; the current class implementations and provided hooksmake it almost trivial to modify the filesystem. While more workneeds to be performed on the operation of the file cache, it issafe to suggest that the most complicated aspect of this projectis actually interacting with a repository on disk and providingaccess through the OC_Filestorage class.

In order to interface with the ownCloud filesystem, just threefiles are necessary: the two appinfo configuration file and aOC_Filestorage implementation. Chapter 6 covers the designof the OC_Filestorage, with implementation details in Chap-ter 7.

Page 27: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

4G I T

In terms of data storage, Git is often described as a content-addressable filesystem, whereas typical storage is described aslocation-addressable. In a location-addressable system, the stor-age medium places new content in any available free space, andstores a location reference. Retrieval involves looking up the dataat the specified location. This model works well for data thatchanges frequently.

In contrast, content-addressable storage uses the content itselfto generate a content address, uniquely and permanently tiedto the content. Requests to the storage consist of the contentaddress, which the system uses to locate the data. For data thatdoes not change often, this method can be extremely efficient, asthe storage location is already known.

Git uses the SHA-1 id of a piece of data, along with its metadata,to provide the content address for each object type. The four mainobject types are:

• Commits - These containing committer and author tags,a timestamp value, a commit message and a reference toa tree object. They may store no parent reference (initialcommit), one, or many (merge commit).

• Tags - These provide named references to commits (e.g.v1.1.3b)

• Trees - These provide a simple tree structure of other trees,and blob objects, forming a directory tree attached to thecommit

• Blobs - These are raw chunks of data, relating to files. Theyhave a few simple pieces of metadata including type, sizeand permissions (executable bit only)

The advantages of this choice are that no piece of content getsstored twice, and since an old version, by definition, will neverchange, Git becomes very efficient for large repositories of smallfile.

All version control systems encounter problems with storinglarge binary file, because of the unsuitability of traditional dif-

19

Page 28: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

4.1 loose files 20

ferencing algorithms. Traditional algorithms are based on theinsertion or deletion of lines, as described in the section on SCCSin Chapter 2. While there are various workaround, most are un-satisfactory in some way, and large binary files have not beenconsidered for the purposes of this project.

Further, since certain objects such as trees and commits storereferences to their children or parents, respectively, each commitis cryptographically hashed against previous commits. Attacksagainst SHA-1 notwithstanding, this is a simple method of veri-fying the integrity of a repository. For more security, Git providesPGP signing of tag objects.

Listing 4.1: Output of git show --pretty=RAW HEAD

commit ba1b1af8512560b5d17fbd90d446acd741fd3d99tree 1b02b92f8d26e8ca7775473e602ca074adf11929parent c111a49d26861b49d0ec1ef23fedff13043765fdauthor Craig Roberts <[email protected]> 1334160940 +0100committer Craig Roberts <[email protected]> 1334160940

+0100

craig committed at Wed Apr 11 17:15:40 BST 2012

When decoded, objects in Git are represented as simple plain-text, with an example in Listing 4.1. Git uses two storage formatsfor these objects, ’loose’ and ’packed’.

4.1 loose files

Loose file are simply the text content of an object, as in Listing 4.1,zlib-compressed and stored in a file on disk. The SHA-1 iddetermines the location of a file in a typical Git repository thiswould be .git/objects/ba/1b1af ...8a7cf.

Loose file have the advantage of O(1) read and write perfor-mance, but the obvious disadvantage of increased storage usage,due to the lack of delta compression. Packfiles are the solution tothis problem, but come with additional complexity of their own.

4.2 packfiles and packfile indexes

Git packfiles are a file on disk, usually under.git/objects/packed/{sha,0,2}/{sha,2,40}. Each pack-file has an associated index, which is used to navigate the packfile

Packfile indexes come in two versions, as of Git 1.6 (currentversion 1.7.5.4). They are largely similar except for their arrange-ment of the offsets and the SHA-1 ids, the larger space for offsetintegers allows version 2 indexes to accommodate packfile largerthan 2GB.

Page 29: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

4.2 packfiles and packfile indexes 21

Figure 4.1: Git packfile indexes, from the Git Community Book (GPL)[9]

Figure 4.1 displays the packfile index visualisation from theGit Community Book [9]. The first four bytes are a magic value,followed by a byte indicating the packfile version.

The SHA-1 listing is a sorted list of SHA-1 ids, the position ofa requested SHA-1 id is therefore the location to retrieve furtherdown the index, from the offset section.

The offset section is stored in order of the above SHA-1 ids,and stores a 4 byte offset to the packfile This offset is the start ofthe requested object in the associated packfile

The fanout table allows even quicker access to the object offset,providing an offset into the SHA-1/offset tables described above;the fanout table provides offsets for each starting byte of theSHA-1 ids. This allows us to skip 8 iterations of the binary search,for even quicker access.

A point worth stressing is that packfile indexes are only nec-essary for random-access to a packfile although reading themprocedurally would require the unpacking of each object andcomparing its SHA-1 id.

The packfile itself is very simple - there is again a 4 byte magicvalue, a version number, and an integer representing the numberof entries in the pack. Following this header are the objects.

Page 30: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

4.2 packfiles and packfile indexes 22

Figure 4.2: Git packfile format, from the Git Community Book (GPL)[9]

The objects stored in a packfile as displayed in Figure 4.2, have avariable-length object header, followed by their zlib-compresseddata.

The object header is a series of one or more 1 byte (8-bit) ’hunks’[9]. The first bit of each byte specifies whether to read the nextbyte (1), or whether the next byte is the start of the data (0). Theremaining seven bits are used to specify the object type and datasize when expanded.

The object type is a simple binary value of 3 bits (0-7). 0 iscurrently undefined, and 5 (101) is not yet in use. Offset deltasand reference deltas are explained below.

Commit Tree Blob Tag Offset Delta Ref. Delta

001 010 011 100 110 111

The object size is read using the remainder of the first byte (i.e.the last 4-bits) and subsequent bytes (their 7-bit values). The first,4-bit value is the least significant part, with subsequent bytesrepresenting the most significant part. As an example, take thefollowing two-byte header:

1001 00000001 0010

Page 31: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

4.3 git references 23

The first four bits represent the read-next-byte flag and the ob-ject type. The remaining four bits are used as the least significantpart of the size with the following seven bit values prepended.This results in a size value of 0010010 0000, or 288 bytes.

Offset deltas store an offset within the same packfile to theirbase object. The delta content is then applied to this base objectto yield the requested object.

Reference deltas store a a 20-byte SHA-1 hash to the base object,which is also in the same packfile This is a constraint designed tomake packfiles independent. The final constraint is a base objectmust be of the same object type as the delta object (i. e.a commitmust be based on a commit, etc.).

4.3 git references

References form the basis of the repository, by providing pointersto the HEAD commit, and to the tips of branches and tags. Namesin Git such as master are actually shorthand for reference names,in this case, .git/refs/heads/master. Here, the file simplycontains an SHA-1 id of the commit.

References may also be symbolic, in which the file contains theref: string, followed by a ref name. References are stored in.git/refs/ with subdirectories for heads (branches), remotes(remote branches or repositories) and tags.

They may also be stored in .git/packed-refs. Documen-tation on this format was sparse, but it is essentially a newline-delimited file of reference names, followed by a space, followedby an SHA-1 id. There are more two operations referenced inthe man page for git check-ref-format, "postfix nth-parent"and "peel onion", but I could not find any reference to decodingthis.

4.4 summary

Git uses a simple object model, with two possible storage formats.All objects are compressed and stored according to the samecriteria, therefore implementing the decoding of any one objectimplies being able to decode them all. Similarly, because the rawGit objects are simply text strings with compressed data, writinga parser eliminates half the work required for write support.

The PHP zlib module can provide support for reading andwriting the compression format Git uses, through the gzcompress()and gzuncompress() functions. The existence of Glip and Grit,PHP and Ruby projects, respectively, prove it should be possible

Page 32: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

4.4 summary 24

to interact with Git repositories in PHP, with no extra dependen-cies.

Page 33: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

5R E Q U I R E M E N T S A N D D E V E L O P M E N TP R O C E S S

This project was undertaken to fulfill two main objectives: toprovide a deeper understanding of version control systems andtheir implementation, and to use that knowledge to provide auseful addition to the ownCloud filesystem.

5.1 functional requirements

The most useful implementation is a transparent extension tothe ownCloud filesystem which utilises ownCloud’s abstractionand hook mechanisms to provide commit-on-write functionality(i. e.each write to disk results in a new commit). This can beimproved upon in later versions with a maximum commit fre-quency.

Based on the filesystem functionality and the ownCloud re-quirements, the following functional requirements were drawnup for the Granite library. Features marked with a “4” werecompleted at the time of submission.

1. Reading Git repositories

4 Reading loose objectsThe system must be able to decompress and read ’looseobjects’ (commits, tags, trees and blobs) from a Gitrepository.

4 Reading packed objectsThe system must be able to extract and read delti-fied, compressed ’packed objects’ in order to be fullycompatible with existing Git repositories.

4 Reading loose referencesThe system should be able to read loose referencesfrom the .git/refs/ directory. These references musttake precedence over packed references.

4 Reading packed referencesThe system should be able to read references from the

25

Page 34: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

5.1 functional requirements 26

.git/packed-refs file if the ’loose’ tag cannot befound.

4 Reading from the branches besides masterThe system should be able to support multiple branches,with reading supported from any commit SHA-1 id,not just master

% Reading a submodule commit SHA-1 idThe system should be able to recognise a submodule ina Git repository, making available the relevant commitSHA-1 id

4 Reading the repository descriptionThe system should be able to read the .git/descriptionfile

2. Writing Git repositories

4 Writing loose objectsThe system must be able to write compressed looseobjects (commits, tags, trees and blobs) to a Git reposi-tory, and update the relevant branch tip on success.These objects should have the same SHA-1 id as ifthey were generated by Git, but this may not alwaysbe sensible.

% Writing packed objectsThe system should be able to store compressed anddeltified objects in packfiles. Since packfiles are im-mutable, this involves generating a new packfile andpackfile index.

% Writing packed referencesThe system should be able to write references to the.git/packed-refs file

4 Writing loose referencesThe system should be able to write loose references todisk, including tags and branches.

4 Initialising Git repositoriesThe system should be able to create a repository on-disk in an empty directory. The repositories may be’bare’, or inside a .git folder within the requesteddirectory. The system should also create an initial com-mit to initialise the repository.

4 Writing submodule updatesThe system should be able to modify the SHA-1 idreference to a submodule.

4 Writing the repository description The system shouldbe able to write to the .git/description file.

3. ownCloud Implementation

Page 35: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

5.2 technical challenges 27

4 Reading Git repositoriesThe system must be able to display the directory struc-ture and file layout of a Git repository, through theexisting ownCloud file viewer. The system should pro-vide information similar to that provided by the UNIXstat command (e.g. file modification date, filename,file size, mimetype, etc.).

4 Commit-on-write when filesystem write hooks fire(i.e. WebDAV and web editor)The system must write a new commit to the Git repos-itory when the ownCloud filesystem performs a ’writeevent” (i.e. calls to OC_Filesystem::fopen() etc.).

4 Roll-back in the configuration menuThe ownCloud admin panel must include an addi-tional section, providing a list of commit messages.Selecting a commit message must ’roll-back’ the viewof the filesystem provided to ownCloud.

4 Read-only flag on “rolled-back” files *The system should provide a read-only flag, whichprevents either the text editor or WebDAV clients frommodifying previous revisions. In the event of a writecall to a previous revision, the system must commitagainst the HEAD of the current branch—in effectperforming a ’roll-back’ and write in a single step.

4 File download support as usualThe system must provide download facilities for bothcurrent and ’rolled-back’ file.

% Folder download support as zip file

4 PHP-only implementation

* Implemented for the web interface; currently ignored byWebDAV clients. Bugs are covered in Chapter 8.

5.2 technical challenges

The implementation language and environment were not choicesin this project, they were constraints. The ownCloud projectis developed in PHP to provide the widest possible range ofsupported installations, across multiple platforms. The web in-terface provides similar cross-platform aspirations, available onany desktop or mobile device. The low-dependency policy ofownCloud restricted the use of a version control binary on theserver. This in turn restricted the choice of libraries available inPHP significantly, hence the development of Granite.

Page 36: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

5.2 technical challenges 28

Overall, the most challenging aspects consisted of readingcompressed streams from the packfile with PHP, maintainingthe minimum-dependency footprint, and an issue concerning thecommit frequency.

The use of zlib by Git necessitated the use of the PHP zlibmodule, which provides gzuncompress() and related func-tions. This is the only additional server dependency, as well asa copy of Granite Granite uses namespaces and some functionsrequiring the use of PHP 5.3+, however, ownCloud’s systemrequirements are also PHP 5.3.

An unanticipated challenge mid-way through developmentwas the inclusion of a new file caching mechanism from otherownCloud contributors. This new feature, released with own-Cloud 3 in January 2012, require some refactoring to work asintended and consumed a significant amount of developmenttime. Several bugs in which the versioning implementation ap-peared to be broken were actually the result out of an out-of-datefile cache.

A final challenge is still unresolved, regarding a ’maximumcommit frequency’ for the ownCloud app. The issue stems fromthe fact that the current implementation of Granite does notprovide a working directory for files being modified; read andwrite calls are performed directly against the repository. A shortexample shows the issue more clearly:

1. User opens a file

2. User clicks save after 30 seconds, a new commit is generated

3. User clicks save after a further 10 minutes. This is greaterthan the threshold of 5 minutes, therefore a new commit isgenerated.

4. User clicks save after a further 30 seconds. At less than thethreshold of 5 minutes, no new commit is generated.

5. User exits the application. The previous 30 seconds are lost.

There are several possible solutions. Rebasing involves ’squash-ing’ several commits together into a single commit, by rewritingthe repository history from the chosen commit. This has the effectof modifying the history, which can be problematic when push-ing or pulling to other repository. This is not currently relevant,as Granite doe not perform push or pull operations, however itmay be a future problem.

A better solution may be to keep a working copy of file: check-ing them out of the repository back into the filesystem. Filesystemcalls would operate as normal, to the local filesystem. Committingwould then be separate operation. This allows multiple save oper-

Page 37: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

5.3 development process 29

ations, with a single commit. User input could then be acceptedfor the commit message.

Feedback on these areas is currently ongoing with the own-Cloud community, and will shape future development decisionsin a manner more consistent with open-source.

5.3 development process

Development consisted of a combination of Test-Driven Development(TDD) and iterative prototyping. Rather than attempt to keep to aregular schedule during University assignments and academicholidays, I deliberately chose to keep development more flexiblewith a hybrid approach. The iterative nature of development al-lowed for varying amounts of effort depending on the complexityof features, as the requirements were not fully formulated forseveral weeks.

Figure 5.2 below displays some examples of the actual de-velopment timescale, compared to the proposed plan. Actualtimescales that did not match expectations are explicity prefixedwith ’actual ...’. Granite proceeded largely on schedule, whilethe ownCloud implementation took significantly longer thanexpected.

5.3.1 Test-Driven Development

TDD consists of a simple three step process described in the intro-duction to Kent Beck’s “Test-Driven Development by Example”[6]:

1. Red – Write a test that fails

2. Green – Write the minimum code necessary to make the testpass

3. Refactor – Refactor the code written during step 2

This process encourages simple designs and, in the words ofKent Beck, “inspires confidence”. A comprehensive set of unittests for Granite was created. At the time of submission, theycovered 81.30% of Granite’s functionality. Development consistedlargely of a series of prototypes, each fulfilling an individual areaof functionality which were then collated into a single library.

Granite was the primary candidate for unit testing, it offeredverifiable results with SHA-1 comparisons, and its stability wouldinevitably affect the ownCloud implementation. Tests were writ-ten before the functionality was implemented, according to thedesign goals described in Section 5.1. The PHPUnit unit testing

Page 38: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

5.3 development process 30

Figure 5.1: Scrum-style task sheet for the ownCloud implementation

library was used to provide a standard framework for runningtests.

In order to gain the most reliable test results and ensure animplementation as close to Git as possible, an early decisionwas to use the SHA-1 ids of objects generated by the official Gitclient as verification of correcting reading and writing. This hasthe advantage of being a quick and easy check, and providingcompatibility with existing Git clients or implementations.

The focus of testing was to ensure reliable output given aknown input. For example, any SHA-1 id that cannot be foundshould always throw an InvalidArgumentException. Objectrequests should always return an equivalent logical object—i.e. atree SHA-1 id should return a Tree object.

5.3.2 Scrum and Evolutionary Prototyping

I utilised a scrum-style task board consisting of Planning, Started,Finished and Tested. The task sheet can be seen in Figure 5.1.Development consisted of several sprints, each resulting in aprototype—see Figure 5.2 for a project plan. The prototypingphase consisted of a number of gradually more complex imple-mentations, with the final alpha release forming the submissionto ownCloud. Unlike traditional Scrum, timeboxing was not

Page 39: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

5.3 development process 31

Figure 5.2: Gantt chart displaying predicted and actual developmenttimescales

utilised due to the unexpected changes and updates that comewith open-source development.

The task sheet provided a visual indicator of developmentthroughout the project, showing which areas still had to be com-pleted and which areas were fully tested. A corresponding chartfor OC_Filestorage determined which functionality had beenimplemented of the ownCloud filesystem interface. The chartprovided feedback when considering which features to shortenor cut from development, and which to extend and focus moreresources on.

Each development “sprint” for the final few weeks of theproject focused on a small set of features and bug-fixes for own-Cloud. Each sprint resulted in a working prototype that was anoticeable improvement on the last iteration. The final productdelivered to the ownCloud maintainer is still considered a pro-totype release. Several of the releases were delayed due to thepersistence of several bugs, or because full implementation of thefeatures for that sprint was not completed.

5.3.3 Open-Source Development

This project was developed alone, in the midst of a larger open-source software development project. While I would hesitate todescribe my project as fully open-source, there were some similar-ities. The source code was developed in public, with a repository

Page 40: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

5.3 development process 32

for both the ownCloud implementation and the Granite libraryavailable on GitHub 1.

A project blog was created at http://craig0990.wordpress.com with a category for ownCloud-related posts. Users con-tributed two comments, including an idea for a maximum com-mit frequency, and two bug reports. The relevant bugs providedvaluable feedback, and were fixed prior to submission.

The final implementation was committed to the ownCloudrepository on Gitorious 2. The number of lines of code written,according to the diff output, was around 3,000

Current work is focusing on finishing the versioning imple-mentation in time for ownCloud 4. Accordingly, the release hasbeen moved back to May 22

nd, with a release candidate availableon May 11

th 3.

1 See https://github.com/craig0990/ownCloud and https://github.com/craig0990/Granite

2 Commit id of b0894b314845a52e979539cbf1e6bd94444dca54, viewableat https://gitorious.org/owncloud/owncloud/commit/b0894/diffs

3 See http://mail.kde.org/pipermail/owncloud/2012-April/002848.html for the relevant mailing list post

Page 41: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

6D E S I G N

The design of the ownCloud implementation was split into twomain components: Granite and the file_versioning app forownCloud. The ownCloud implementation can be further decom-posed into a OC_Filestorage implementation, providing aninterface to the rest of ownCloud, and a PHP stream wrapperimplementation, providing support for native filesystem accessto Git repositories.

Granite provides access to Git repositories through a set ofclasses often described in Git as the porcelain. This is what the’user’ of the library utilises to get the job done. The plumbing thenperforms the actual work based on the input from the porcelainlayer.

6.1 granite porcelain

The high-level interface of Granite consists of just six classes,with only the first five currently implemented: Repository,Commit, Tree, Node, Blob, and Tag. The implemented classesare discussed in more detail in the following sections.

The Repository Class

The Repository class is the main point of access to a Git repos-itory, mainly through the methods head(), factory() andinit(). Additional helper methods, branches() and tags()and path(), return extra information such as a list of branches,tags or the location of the repository, respectively. A Repositoryobject is initialised with the location of a Git repository, which issubsequently passed to other objects for their reading and writingoperations.

The HEAD of any given branch can be read with head($branch,$value = NULL), with a single string argument. If $value isnot provided, the method returns a Commit object representingthe most recent commit for further manipulation. If $value is set

33

Page 42: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

6.1 granite porcelain 34

Figure 6.1: Class diagram of Granite

to a valid SHA-1 id, the HEAD of the relevant branch is updatedto match the SHA-1 id.

Other types of objects can be retrieved through the factory($type,$sha = NULL) method, by providing a string type of eithercommit, tree, blob, or tag and an SHA-1 id. If the SHA-1 id is notprovided, an empty object is returned. A special case is the tagobject, in this case the $sha parameter is interpreted as the nameof the tag, rather than the actual SHA-1 id. The SHA-1 id wouldthen be fetched by reading the tag. The SHA-1 id is passed to theobject constructors for each type.

The init() method provided support for creating new Gitrepositories, simply by creating .git/objects and .git/refsdirectories, along with a .git/HEAD file. This repository canthen be manipulated by either Granite or the official Git client.

The Repository class also provided convenience methodsfor writing to a repository: add(), rm() and commit(). Theserepresent rough analogs for the commands git add, git rmand git commit.

The add() command takes a file path and either a Blob or aTree object. The rm() method takes the file path provided toadd(). These methods modify an in-memory Tree object to be

Page 43: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

6.1 granite porcelain 35

written later with the commit() method. Finally, the commit()method simply takes a message and an author name, and thengenerates an appropriate commit.

The Commit class

The Commit class provides access to the history of a repository,requiring a path to the repository and an optional SHA-1 idof a commit. The most relevant methods are parents() andtree(). The parents() method returns an array of SHA-1 ids,which can then be fetched through Repository::factory(),permitting traversal of the repository history. If the $parentsparameter is provided, the parents of the commit are modifiedappropriately.

The tree() method returns a Tree object, which enablestraversal of the dietory tree associated with the commit. As withparents(), if a $tree parameter is provided, the tree objectassociated with the commit is updated accordingly.

The methods message(), committer(), and author() meth-ods return the string value of their respective fields, also updatingthe values if a parameter is passed.

The sha() method returns the SHA-1 id of the current con-tents of the Commit object—it does not return the original SHA-1id. The write()) method uses the output of sha() to deter-mine the storage location for the object, and then write the com-pressed object data to disk.

The combination of read and write functionality in a singlemethod call is used across Granite presenting a simple, easy touse API with fewer method names to remember.

The Tree class

The Tree class takes a path to the repository and an optionalSHA-1 id. One of the simplest objects, the Tree class provideswrite() and sha() methods similar to Commit. The only rele-vant method is nodes(), which returns an array of Node objectsrepresenting the leaves of the tree. An array of Node objects canoptionally be passed to nodes(), allowing updating of the treecontents.

Storing the Tree contents in a lightweight Node objects imple-ments a form of ’lazy-loading’ to provides a cheaper method oftraversing directory trees, rather than fetching all the sub-trees atonce.

Page 44: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

6.1 granite porcelain 36

As a side-note, in order to match the SHA-1 id generated byGit, this class must use the same sorting method for directory andfile names when writing them to disk. This is the only purposeof the private _sort() method.

The Node class

The Node class represents one of either a file, a directory, or a Gitsubmodule.

The name() and type() methods perform obvious functions,with type() returning a value of either tree, commit or blob. Themode() method returns the octal value of the file permissions,as stored by Git (ie only the executable bit is tracked).

The sha() method for Node does not return the SHA-1 id ofits contents, but the SHA-1 id of the Tree or Blob object it repre-sents. This can then be fetched as usual through Repository::factory().

Lastly, the convenience methods isDirectory() and isSubmodule()simply check the output of mode() and return a boolean value.

The Blob class

The Blob object represents a file with the same sha(), size()and write() methods as the Commit class. The remaining meth-ods are content(), which returns the file contents as a binarystring, and mimetype(). The mimetype() method attempts toguess the mimetype of the blob content, e.g. text/plain.

Summary

Through the use of these objects, a Git repository can be ac-cessed with read and write support. An example repository isdisplayed in Figure 6.2, showing the relationship between thevarious porcelain objects.

Repository history can be traversed either from the HEAD, orany randomly selected commit. directory trees can be traversedfrom each commit object, and file contents can be retrieved fromthe repository once found in the tree.

Write support is implemented in this layer, using the object con-tent from the plumbing layer, described below in more detail. Thedecision whether to write a loose or a packed object is abstractedfrom the user, with only loose objects currently supported due totime constraints.

Page 45: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

6.2 granite plumbing 37

Figure 6.2: Object diagram of Granite for a simple repository

6.2 granite plumbing

The Raw and Index classes form the top-level of the plumbinglayer, with Loose and Packed extending Raw for more spe-cialised purposes. The porcelain layer requests a raw Git ob-ject by providing the repository path and an SHA-1 id to theRaw::factory() method. The factory() method returns aninstance of either Loose or Packed. The Packed class utilisedthe Index class to provide searching of the repository packfiles.Figure 6.3 displays the relevant classes for reference.

The Raw class

Representing the raw content of a Git object, the Raw class simplyprovides common methods to the Loose and Packed classes,and provides a single point of access to the storage formats usedby Git.

The content() method returns the raw content of the Gitobject, which is then interpreted by the relevant higher-levelobject. The size() and type() methods implement obviousfunctions, used to determine the decompressed size and type ofthe object.

The Loose and Packed classes

With the methods inherited from the Raw class, the Loose onlyneeds to located the file based on the SHA-1 id, decompress

Page 46: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

6.2 granite plumbing 38

Figure 6.3: Class diagram of the Granite plumbing’

Page 47: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

6.3 OC_FILESTORAGE_VERSIONED 39

it, and set its internal class properties (type, size and content)appropriately.

In contrast to the Loose class, the Packed class is more com-plex, as it involves reading deltified object representations fromthe binary packfile. However, it implements the same basic inter-face as the Loose class, providing a common interface regardlessof the underlying storage.

The Packed class locates and decodes Git objects stored inpackfiles, using the Index class for search. Objects that consistof delta strings will be applied to their base object to yield a’full’ object. This constructed object is conceptually identical to aLoose object, and can be used in the same manner.

The Index class

The Index class is instantiated with a path and the name of apackfile It provides a single public find() method, taking anSHA-1 id, which returns the offset location of the object in therelevant packfile This is then used by Packed to read the objectand construct the Git representation.

Packfile indexes transitioned to version 2 as of Git 1.6.6 (re-leased December 2009), therefore version 1 indexes are not sup-ported, due to a lack of packfile indexes for testing purposes.

6.3 OC_FILESTORAGE_VERSIONED

The OC_Filestorage_Versioned implementation provides acommon point of access to all versioned repositories for own-Cloud. By installing and enabling the ownCloud app, the storageclass is registered and mounted in the ownCloud filesystem. Oncemounted, the OC_Filestorage_Versioned class acts largelyas a proxy to the OC_VersionStreamwrapper implementation.

The OC_Filestorage calls are made by OC_Filesystem,which expects PHP resources to be returned (the PHP manual de-scribes resources as “. . . a special variable, holding a reference toan external resource.”). In order to return the resource types thatthe rest of ownCloud expects, the OC_Filestorage implemen-tation must use native filesystem calls to access the repository.This necessitates an extra layer of abstraction with a PHP streamwrapper implementation, described below.

Having examined the class specification shown in Appendix Aand the OC_Filestorage_Local class, the only methods whichmodify the repository are: mkdir(), rmdir(), unlink(), rename(),and file_put_contents().

Page 48: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

6.4 OC_VERSIONSTREAMWRAPPER 40

These calls will save content to the repository using nativefilesystem calls to versioned:// URLs, provided by OC_VersionStreamwrapper.In order to keep the ownCloud file view up to date, a persistentpreference value is needed representing the current HEAD ofthe ownCloud view. If we were not concerned with ’rolling-back’the state of a directory, this would not be necessary—instead, wecould simply display the HEAD at all times.

To allow for this, a HEAD ’pointer’ is stored in OC_Preferencesfor the file_versioning app. A special value of “HEAD” isprovided when the app is first enabled, representing the mostrecent commit. This HEAD pointer is used to determine whichview of the repository is presented to ownCloud at any givenmoment.

The remaining methods in the OC_Filestorage class willsimply pass through the return value of native filesystem func-tions. For example, the is_dir() method can be implementedby calling the filesystem function is_dir() on a versioned://URL. The operation of other methods will be similar, and is notcovered in detail here.

Methods that are expected to deviate from this proxy behaviorinclude is_writable(), filetype() and getMimeType().The filetype() method is expected to return a string valueof either file or dir, while getMimeType() determines themimetype by using the finfo() functionality of PHP 5.3.

The is_writable() method will return true by default, un-less a check of OC_Preferences and the current repositoryHEAD commit indicates a ’roll-back’ has been applied. In thisevent, the method returns false, making previous history read-only. An important note is that if this read-only flag is ignored,the save operation must still be applied to the current repositoryHEAD in order to prevent the loss of data and version history (asa technical note, saving a new commit on top of a previous ver-sion will result in a branch, however, branches are not accessiblethrough the stream wrapper).

Finally, the filemtime() and associated methods, will returnthe timestamp of the last commit in which they were modified.As Git keeps an entire directory snapshot with each commit, thetree history must be walked in order to determine which filechanged at which point.

6.4 OC_VERSIONSTREAMWRAPPER

The OC_VersionStreamwrapper class provides an implemen-tation of the stream wrapper interface described in the PHP man-ual. A stream wrapper is registered with the stream_wrapper_register()

Page 49: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

6.4 OC_VERSIONSTREAMWRAPPER 41

function, taking a URL protocol and a classname string as its argu-ments. In this case, our URL protocol is versioned, and will beused to route calls to repositories through the OC_VersionStreamwrapperclass.

Although the stream wrapper implementation has been de-signed and included within the ownCloud app, there is everypossibility this class could be refactored into the Granite library,or a separate library altogether.

The method names of stream wrapper implementations mirrorthose of native filesystem calls. For example, OC_VersionStreamwrapper::dir_opendir()returns boolean true or false in response to a direct opendir(’versioned://...’)call. If the class returns true, PHP creates and returns a resourcehandle to the calling code.

URL structure

The URL structure used by the stream wrapper is very sim-ple, using the existing format of a scheme, a file path and afragment identifier. For example: versioned:///path/to/repository/.git/some/file.txt#687ca.... This allowsthe encoding of the repository location, the file path requestedand the commit SHA-1 id to fetch the file from. In this case, thefile path is parsed to determine both the location of the repository,and the file requested, with the .git part used as a delimiter.

There are some limitations in this approach. First and foremost,the URL has no concept of branches. Second, the file path isassumed to be on the local server, which may present futureproblems with remote repositories.

Section 2.2 of RFC 1738 (“Uniform Resource Locators (URL)“)discusses reserved characters in URL schemes. The most promi-nent example is the definition of a query string in HyperTextTransfer Protocol (HTTP), which reserves the question mark asan initial delimiter, followed by key-value pairs separated by anampersand. Given the flexibility and generic syntax of a URL, itshould be possible to create a scheme which identifies both thebranch requested, and the true location of the repository (local orremote).

Empty Directories

Since Git cannot track empty directories, a side-effect of its imple-mentation as content-addressable storage, an alternative must befound instead. The most logical and common choice the Git com-munity is to place a dot-file in the new directory and continue as

Page 50: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

6.5 the owncloud admin panel 42

normal. OC_Filestorage_Versioned will provide a .emptyfile in mkdir() calls to enforce the tracking of empty directories.

Metadata

Similarly, Git doesn’t track metadata besides the executable bit.While it may be worthwhile storing metadata in a dot-file in theversioned directory, this has not been explored in this project.Files are assumed to have -rw-r-r- permissions, unless theexecutable bit is set.

6.5 the owncloud admin panel

The HEAD pointer mentioned in Section 6.3 can be updated viathe admin panel in ownCloud, allowing the roll-back of versioneddirectories and, consequently, file recovery and download.

The existing architecture in ownCloud provide assets in direc-tories under the app folder. The directories ajax, css, js andtemplates provide the ability to insert and modify new HTMLcode into any ownCloud page, and are attached through a seriesof ’hooks’.

Through the use of existing hooks, the ownCloud app attachesits stylesheets and JavaScript file to the ownCloud template forthe admin panel. A selection box is provided, enhanced by theJavaScript Chosen library included in ownCloud 1, listing therepository commits in reverse chronological order.

When the admin panel is loaded, the select box is set accord-ing to the return value of an AJAX call, which simply returnsthe value of OC_Preferences. Selecting a commit updates theHEAD pointer stored in OC_Preferences by calling an AJAXscript, which simply sets the OC_Preferences value accord-ing to the select box value, which is the relevant SHA-1 id of acommit.

1 Chosen is a JavaScript plugin for jQuery and Prototype which provides user-friendly features such as search-able selection boxes, amongst other enhance-ments

Page 51: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

7I M P L E M E N TAT I O N

As discussed in Chapter 6, the porcelain in Granite refers to thetop-level Repository, Commit, etc. classes, while the plumbingrefers to the internal Raw and related classes.

The ownCloud app can be similarly decomposed into the filestorage interface and the stream wrapper implementation. Thefollowing sections discuss the implementation details of thesemain areas, in a similar manner to Chapter 6.

7.1 granite porcelain

The Repository class

The Repository class interacts only with the Commit, Treeand Blob classes. The methods head() and factory() returnthese objects, while the other methods provide information aboutthe repository.

In order to fetch the HEAD of a given branch, Granite mustknow the SHA-1 id for the most recent commit on that branch.Thankfully, Git makes this easy by storing a reference to this com-mit in .git/refs/heads/master, where master is the nameof the branch. This SHA-1 id can be used to call the factory()method and return the Commit object to the user.

The factory() method simply instantiates a new object basedon the type requested, passing the path and SHA-1 argumentsthrough as constructor parameters. This new object then handlesreading from the repository using the plumbing layer, describedbelow in more detail.

The init() method initialises an empty Git repository (withno initial commit) in the requested directory. An empty Git repos-itory is actually just two directories and a file referring to thecurrent HEAD.

Finally, two private methods allow the reading of referencesin a Git repository, either the ’loose’ references stored under

43

Page 52: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

7.1 granite porcelain 44

Figure 7.1: File layout for the files_versioning ownCloud app

.git/refs or ’packed’ references stored in .git/packed-refsin a newline-delimited file.

The Commit class

The Commit class is instantiated with a repository path and anoptional SHA-1 id. If the SHA-1 id is provided, a Raw object isinstantiated and parsed with simple string matching to populatethe relevant class properties. An associated top-level Tree objectis instantiated and stored, rather than storing the SHA-1 id. Thisallows quicker retrieval of the directory tree of a commit, withoutloading the entire tree into memory. If the SHA-1 id is omitted(or NULL) then all fields are initialised to NULL.

The write() method simply reverses the parsing operationfound in the constructor, generating an SHA-1 id and a string ofcontent from class properties. This string representation is com-pressed with the gzcompress() and saved to disk according tothe SHA-1 id of the content.

The Tree and Node classes

The Tree class provides a simple tree structure, with Nodeinstances representing the leaves. The Nodes are much morelightweight than storing entire sub-tree or Blob instances.

Page 53: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

7.1 granite porcelain 45

Trees are stored in Git as a sorted list of nodes, with each lineconsisting of an octal integer representing the file permissions,a space, and the SHA-1 id of the referenced object. The modeis used to determine whether the node is a tree, a blob, or asubmodule.

Upon instantiation, the Tree class uses the SHA-1 id (if pro-vided) to create a new Raw object. The content of this object isthen parsed, line by line, with new Node instances created foreach entry.

The Node class contains no real functionality, and is used as asimple structure for encapsulating each tree entry.

The sorting algorithm used when writing Tree objects to diskwas adapted from the original C source code of Git, which islargely similar to the strcmp algorithm with some adaptationsfor directories.

The Blob class

The Blob class is included here for completeness, although itsfunctionality is no different to the classes described above. It usesa Raw object to fetch the binary content of a file and writes to aloose object file with gzcompress().

The only difference to previous classes is the use of the finfoobject to facilitate mimetype detection. Issues with existing PHPfunctions were encountered because they expect an actual file tobe present, which isn’t the case with a Git object. The buffermethod of finfo provides a method for detecting the mimetypeof a binary string, providing a simple solution to this issue.

Summary

The porcelain layer is intended to provide a simple, robust in-terface to a Git repository without exposing the details of whatgoes on internally. The classes are tightly coupled, however it isstill possible to use them independently, provided the plumbingclasses are also available.

The real heavy lifting in Granite comes from reading packfilesand their associated indexes, which is explored in more detail inthe following section.

Page 54: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

7.2 granite plumbing 46

7.2 granite plumbing

In order to fetch a commit, or any other object, the SHA-1 id isused to determine the storage location. As discussed in Chapter 4,Git uses loose file until it cleans them up into packfile. We canuse this knowledge to determine the location of the Git object.

Loose objects are stored under .git/objects, within a subdi-rectory named after the first two SHA-1 characters, and a filenamenamed after the remaining characters. For example, the object8a7d6c... is stored under .git/objects/8a/7d6c.... Ifthis file does not exist, we can assume it is stored in a packfileuntil proven wrong.

This is the approach taken by the Raw object, which providesthe main point of access to the plumbing layer. A filepath isgenerated based on the SHA-1 id and checked to see if it exists.If it is, a Loose object is returned, otherwise a Packed object isreturned from the Raw::_findPackedObject() method.

The _findPackedObject() method iterates through thepackfile indexes until it finds an offset for the requested SHA-1id, through use of the Index::find() method. Once found, anew Packed object is instantiated and returned.

The Index class

The only public-facing functionality of Index is find($sha),which returns either an integer offset or boolean false. The struc-ture of the packfile index is covered in more detail in Chapter 4;the most relevant aspects are the fanout table and the SHA-1 table.

The fanout table is used to quickly jump to a specific locationin the SHA-1 table—since the SHA-1 table is sorted, the fanouttable can store offset pointers into the SHA-1 table for each SHA-1id, based on the first character value. As an example, to searchfor the SHA-1 id 8a6dc... we can lookup fanout table entrynumber 54—the ASCII value of the character 6.

This offset pointer allows us to seek to each section of theSHA-1 table depending on which SHA-1 id we are looking for.The table is sorted to allow a binary search operation, with thefanout table providing a method of avoiding eight iterations ofthis binary search [9].

Once we have found the relevant section in the SHA-1 table,we can simply iterate over the records until one matches therequested SHA-1 id. The records are stored as 20-byte binary rep-resentations of the hexadecimal SHA-1 id, which can be decoded

Page 55: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

7.2 granite plumbing 47

in PHP using the unpack(’H’) function. Upon finding the re-quested SHA-1 id, our position in the SHA-1 table determine thelocation to seek to in the offset table.

Finally, the Index class reads the offset stored in the offsettable and returns it. The calling code continues from there. If theSHA-1 id is not found, boolean false is returned and the callingcode can decide action to take.

The Packed class

The Packed class provide the interface defined by Raw, witha set of private methods for reading Git packfile. The Packedclass also has methods to interpret variable-length integers foundin the delta headers, and base-128 big-endian integers used topoint to offset deltas. A Packed object is instantiated with therepository path, and the name of a packfile within that repository.

The first step involves the _readPackedObject() method,which takes an offset as a parameter, and seeks to the desiredoffset in the given packfile. The object type and its decompressedsize are determined by the _parseHeader() method.

The object type is then checked to determine if the raw contentis a delta, or a snapshot of the object. In the event of a delta, the_unpackDeltified() method is called and execution stops. Inthe event of a snapshot, the object type, size and decompressedraw content are returned as an array.

The _unpackDeltified() method takes one of two actionsdepending on whether the requested object is an offset delta or areference delta. The latter simply stores the SHA-1 id of the baseobject, requiring only a call to Raw::factory() to fetch. Anoffset delta stores an offset within the same packfile using thebase-128 big-endian encoding mentioned previously.

The format for this integer encoding appears to be Git-specificand is described by Raymond S. Brand in a document titled “GitData Formats” [7]. The code in Granite has been adapted fromthe example C code provided in this document.

In either case, the base object is retrieved and the _applyDelta()method is called, providing both the delta and the base object asparameters. The delta encoding appears esoteric and complex,and rather than re-invent the wheel, the code has been takenfrom the PHP Glip library with clear attribution. This is also theonly place where the variable-length integer encoding appears tobe used, used as a ’sanity check’ to ensure the patch was appliedcorrectly by storing the expected size of the result.

The algorithm is virtually identical to that used in the RubyGrit library, which is MIT-licensed. They have both been adapted

Page 56: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

7.3 the OC_FILESTORAGE_VERSIONED class 48

from the original Git source code 1 and it is assumed that itsinclusion in Granite is not a cause for concern.

After the delta has been applied to the base object, the infor-mation is returned as if it were a snapshot object (because it nowis) in the _unpackDeltified() method, returning an arraycontaining the object type, size and decompressed content.

The Loose class

The Loose class is trivial its counterpart, Packed. The onlyimplemented method is the constructor, with the rest inheritedfrom Raw. The constructor uses the SHA-1 id provided to fetchthe loose object file and decompress it. The class properties areset, and the object is returned as usual.

7.3 the OC_FILESTORAGE_VERSIONED class

Figure 7.2: OC_Filestorage_Versioned class diagram

1 Specifically the patch-delta.c file in the Git project repository, which de-codes the output generated by diff-delta.c

Page 57: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

7.3 the OC_FILESTORAGE_VERSIONED class 49

The OC_Filestorage_Versioned class has been implementedaccording to the abstract specification of OC_Filestorage—seeAppendix A for a source listing. It uses the PHP stream wrapperimplementation described below to return filesystem resourcehandles to the calling code. This allows native integration withPHPs fopen() and related functions.

By providing ordinary resource handles, the calling code doesnot need to have any idea that a folder is versioned. Calls to thefilesystem are made through OC_Filesystem, which in turndecides which OC_Filestorage implementation to execute.

The two most significant segments of code deal with creatingan initial commit if one is not already present, and searching theTree objects for a particular filepath.

Initial Commit

If the repository exists and is empty, an initial commit is createdand written to disk. The current implementation of this code,shown in Listing 7.1, is a useful example of using Granite tointeract with a repository.

Listing 7.1: Initial commit implementation

<?php$this->repo = new Granite\Git\Repository($path);// Create a README blob$blob = new Granite\Git\Blob($this->repo->path());$blob->content(’Your Backup directory is now ready for

use’);// Create a new tree and add the README file$tree = $this->repo->factory(’tree’);$tree_node = new Granite\Git\Tree\Node(’README’, ’100644’

, $blob->sha());$tree->nodes(array($tree_node->name() => $tree_node));// Create an initial commit$commit = new Granite\Git\Commit($this->repo->path());// Generate user string with OC_User$user_string = OC_User::getUser() . ’ ’ . time() . ’

+0000’;// Update the commmit$commit->author($user_string);$commit->committer($user_string);$commit->message(’Initial commit’);$commit->tree($tree);// Write it all to disk$blob->write();$tree->write();$commit->write();// Update the HEAD for the ’master’ branch$this->repo->head(’master’, $commit->sha());

The add(), rm(), and commit() methods of the Repositorymethod are not currently used for writing to the repository; their

Page 58: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

7.4 the OC_VERSIONSTREAMWRAPPER class 50

implementation is limited to adding or removing top-level fileuntil development resumes.

Tree Search

The tree search, defined in OC_Filestorage_Versioned’stree_search() method, accepts a repository object, a tree ob-ject, and a filepath. The file path is expanded into an array ofdirectories, with the basename of the file at the tail of the array.Each Node in a tree is compared against the current path element,recursing for sub-directories as necessary.

Once the filepath has been found, there is no need to unwindthe recursion as the SHA-1 id allows us to fetch the object with asingle call to Repository::factory().

7.4 the OC_VERSIONSTREAMWRAPPER class

The OC_VersionStreamwrapper class has been implementedaccording to the class definition given in the PHP manual, asshown in Figure 7.3. It allows the use of versioned:// URLsin conjunction with the usual PHP filesystem functions.

The stream wrapper handles the actual read and write oper-ations to Git repositories in ownCloud, by calling the relevantGranite methods. The OC_Filestorage_Versioned class han-dles the registration for versioned:// URLs, allowing the call-ing code to decide which scheme (or schemes) to register.

Multiple methods are called across a single operation, for exam-ple file_get_contents() will generate calls to stream_open(),stream_read() and stream_close(). This implementationrequire the temporary ’caching’ of objects in class properties, foruse in subsequent method calls.

The tree search implementation described in the previous sec-tion is used extensively in the stream wrapper class, to findthe tree and blob objects representing the file being requested.Once loaded, only two methods return strings: dir_readdir()and stream_read(). The rest of the methods, barring stat(),stream_tell(), and stream_write), return boolean true orfalse indicating success or failure.

dir_opendir() sets an internal pointer to zero, which isused and incremented by dir_readdir() to return the file-names or directory names of the next entry. The stream_read()method operates in a similar manner, using an internal pointerset by stream_open(). Calls to fread() include a $lengthargument, which specifies the number of bytes or characters

Page 59: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

7.4 the OC_VERSIONSTREAMWRAPPER class 51

Figure 7.3: OC_VersionStreamwrapper class diagram

Page 60: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

7.5 the owncloud admin panel 52

to read. stream_read() updates the internal pointer, whichis used by methods stream_seek(), stream_tell(), andstream_eof() to provide their functionality.

Two other methods, stream_stat() and url_stat(), re-turn an array of file information, in response to stat()-relatedfunctions. Finally, stream_tell() and stream_write) re-turn an integer, the current stream position and the numberof bytes written, respectively. A full definition of the prototypestream wrapper class, from the PHP manual, can be found inAppendix B.

With these methods implemented, a host of functionality be-comes available through versioned:// URL calls. Listing 7.2shows how this made implementing the OC_Filestorage_Versionedclass extremely simple, by simple passing relevant calls throughto the stream wrapper. PHP itself handles the creation andgarbage collection of resources, and the client code operatesunder the impression it is dealing with a local filesystem.

As a final note, file modification dates for versioned files arenot currently implemented correctly. For true file modificationdates, the repository history must be traversed until the lastmodified version of that file is found. This is a consequence ofthe fact that Git tracks the entire project state—if any given file ismodified in a commit, all the other files are still present, leavingno way to distinguish the files modified without performing adiff operation on the two trees (the trees before and after thecommit).

As a workaround, the OC_Filestorage_Versioned classuses the commit date as the file modification date, with activedevelopment taking place in the ownCloud community to fix thisissue.

7.5 the owncloud admin panel

The ownCloud architecture makes extending the user preferencespanels very easy, with a call to the OC_APP::registerPersonal($app,$page) hook. The template passed is then included in the adminpanel when rendered.

The template contains a select box, displayed in Figure 7.4,which presents the user with a list of commits. At the time ofwriting, only the last fifty commits are listed to keep the loadingtime for the admin page reasonable.

Figure 7.4: ownCloud Admin Panel – Versioning and Backup

Page 61: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

7.5 the owncloud admin panel 53

Listing 7.2: Example use of OC_VersionStreamwrapper byOC_Filestorage_Versioned

<?php

class OC_Filestorage_Versioned extends OC_Filestorage {...public function filesize($path) {return filesize("versioned:/{$this->repo->path()}$path

#{$this->head}");}

public function file_exists($path) {return file_exists("versioned:/{$this->repo->path()}

$path#{$this->head}");}

public function filemtime($path) {return filemtime("versioned:/{$this->repo->path()}$path

#{$this->head}");}

public function file_get_contents($path) {return file_get_contents("versioned:/{$this->repo->path

()}$path#{$this->head}");}

public function file_put_contents($path, $data) {$success = file_put_contents("versioned:/{$this->repo->

path()}$path#{$this->head}", $data);if ($success !== false) {// Update the HEAD in the preferencesOC_Preferences::setValue(OC_User::getUser(), ’

files_versioning’, ’head’, $this->repo->head()->sha());

return $success;}return false;

}...}

The js/settings.js file is responsible for handling theHEAD preference stored by ownCloud. The inclusion of jQueryin ownCloud allowed this to be implemented in twenty-five linesof code, with an AJAX request to ajax/gethead.php initiallysettings the select box value, and ajax/sethead.php uponchanges to the select box.

The ajax/gethead.php file simply returns a JSON-encodedobject with the key name head and the SHA-1 id of the currentcommit. The ajax/sethead.php receives an ordinary HTTP

POST request containing the new SHA-1 id, and updates theownCloud preference accordingly.

Page 62: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

8T E S T I N G

8.1 granite testing

Figure 8.1: Code coverage of Granite’s release tags

At the time of submission, the unit tests for Granite covered81.30% of the code that took 13 weeks to write Initially, the unittests used SHA-1 ids taken manually from Git. As of January,a new bootstrap system used the git binary to read randomSHA-1 ids from an existing repository, described below.

A side-effect of comparing the SHA-1 from Git repositories tothose generated by Granite is ensuring the content of the resultantGit objects are exactly the same. This was evident throughouttesting, where a malformed string or extra newline charactercompletely changed the SHA-1 ids, indicating something clearlywrong.

Testing was achieved through the comparison of returned SHA-1 ids with constants generated by a PHPUnit bootstrap file. Thebootstrap file is a PHP script which used git verify-packwith the verbose -v flag to generate a list of SHA-1 ids in therepository. These were then saved. The pack verification takestime to execute; saving the SHA-1 ids to a constant for later usersaved a lot of time when testing.

54

Page 63: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

8.1 granite testing 55

Over 7 releases of Granite test coverage remained above 70%,as shown in Figure 8.1. The test suites used to generate Figure 8.1were tested against the latest ownCloud repository commit. AnyGit repository may be used to run the test suite including reposi-tories with or without packfile.

Areas not covered by testing include:

• Version 1 packfiles - although git pack-objects can beforced to generate version 1 packfiles, this was not discov-ered until late in development.

• Submodules - Granite provides basic support for submod-ules, but these were not fully tested as they are not antici-pate in ownCloud repositories.

• Tags - tags lack full support, due to time constraints and alow priority.

Figure 8.2: Final code coverage statistics for Granite

These will be tested in more detail during future of Granite,as their implementation is not fully necessary for ownCloud.As shown in in Figure 8.2, key concerns are the Repositoryclass and the tree sort method in terms of test coverage. Thegraph in the top-right of the figure shows complexity across the

Page 64: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

8.2 owncloud testing 56

classes, determined by the CRAP1 rating. The Repository isagain the most complex, which is acceptable as nearly every taskis accessible through this class. Effort should be taken to reducethe complexity of the Tree and Packed classes, and improvetheir test coverage.

8.2 owncloud testing

Listing 8.1: Todo list for ownCloud versioning app, by Sam Tuke

/*** ...* @todo Fix bug stopping git repo initialisation in

Backup for admin user* @todo Fix bug causing Backup folder to not appear in

file list for non-admin user - force delete filecache for user on first login (after git init), orchange order of init execution to put it first

* @todo Make personal settings interface clearer:folders are in column, commit histories are inanother col, both clearly labelled etc.

* @todo Link select menu with commit messages to a viewof the file at that revision?

* @todo Add ability to restore backup (reset --hard ?)version of file from personal settings page

* @todo Decide whether v4 will support importingexisting git repos, or just allow creating new ones/ tracking existing folders

* @todo Add support for git tracked delete and renamefiles (and move?)

* @todo Allow users to set their own commit messages (rather than "samtuke modified the ’x’ file [date]")?

* @todo Rename ’Backup’ folder to something clearer -file safe, safehouse, version-controlled, somethinglike that

* @todo try out using versioned:// URLs in PHPs file-handling functions (versioned:// is not a protocolusable outside of app php code?)

* ...

The ownCloud implementation was not heavily tested, insteadrelying on the stability of Granite This is unfortunate, althoughdelays in the ownCloud implementation put a strain on anytesting effort at all. There was, however, an element of communitytesting and more empirical testing will follow with the release ofownCloud 4.

Two community bug reports were filed, the first concerningan issue where new folders could not be created, the seconddetailing an issue with the use of $_SERVER[’ROOT’] in theownCloud app. Both of these were fixed before the submissionto ownCloud.

1 CRAP combines cyclomatic complexity and code coverage, according to theGoogle Testing Blog (http://googletesting.blogspot.co.uk/2011/02/this-code-is-crap.html)

Page 65: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

8.2 owncloud testing 57

Further bugs were found once submitted to the ownCloudrepository, documented by ownCloud contributor Sam Tuke. List-ing 8.1 displays the PHPDoc comments made following submis-sion, which represent the main focus of development followingsubmission.

Page 66: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

9S U M M A RY

Previous chapters have discussed the details of Granite and own-Cloud; both of these projects have potential for future directionsand alternative uses.

Granite intended to be a full Git library, can be used as acontent-addressable storage system, much like Git itself. Al-though it lacks a fully-fledged implementation, enough supportis there to build a functioning versioning system. Granite cantherefore be used to interact with Git repositories for any pur-pose, not just ownCloud, and will only improve as developmentcontinues. The git binary has presented no problems readingand writing to Granite-created repositories.

The ownCloud app relies on the OC_Filestorage imple-mentation for communicating with versioned folders, while thestream wrapper communicates with the repository. This decou-pling allows simple modifications in future to store repositoriesusing multiple formats. The OC_Config will be used in futurework to select multiple folders for versioning, this same mech-anism can be extended to support multiple repository formatssuch as Mercurial, as mentioned in Chapter 2.

Of the twenty-two features specified in Chapter 5, twelve wereeventually implemented. The implemented functionality providesa solid prototype for future releases of ownCloud to build upon;with the inclusion of this prototype in the main Gitorious repos-itory, I am confident the code can be refactored and improvedeven more rapidly. Several functions, such as rename, move, anddelete, were not implemented due to an increasing lack of timeand a relatively low priority. In theory they are simple modifica-tions to existing code.

Some features worked unexpectedly. Rolling back the filesys-tem automatically provides file downloads of old revisions, andviewing through the built in text editor, provided the file cacheoperates as expected. The saving of WebDAV files through theversion control system was completely unexpected—it appearsto be completely integrated with OC_Filesystem in some way,and saved a large amount of development effort.

58

Page 67: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

summary 59

Despite this, several bugs are present at the time of submission,primarily the lack of some file operations mentioned above, andfile cache issues. Despite the appearance through the web inter-face of some files not appearing, the read and write functionalityto the underlying repository is not affected. The main issue is themodification dates of a rolled-back folder—they are less than thelast-cached dates in the database, preventing an update for thenew folder layout.

Finally, It is possible to provide SSH-access to the repositoriesthrough Git’s SSH transport. By running git update-server-infocommand on the repository and creating an account on theserver with the home directory linked to the ownCloud user di-rectory, access can be gained through ssh://user@hostname/pathrelative//to/repo. Push and pull operates as expected,although this has not been tested.

Page 68: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

10F U T U R E W O R K

There is a lot of work to do before the release of ownCloud 4

on May 22nd, and several possible directions afterwards. The

main concerns for the ownCloud development are user interfacemodifications and better filecache integration.

Users should be able to configure any folder for versioning,this is expected to be implemented using OC_Config to store alist of folder selected for versioning. Extra modifications to theadmin panel will also be necessary. It should also be easy toaccess a Git repository through the Git client, it may be possibleinteract with the repository via mounted WebDAV.

The currently proposed user interface modifications are a drop-down selection box available by clicking the ’file modified’ datein the web viewer. Previous revisions of the file will be displayed,each with “rewind” icons available to restore the relevant version.This require the ability to retrieve the history of an individual filethroughout the life of the repository. This could be included ineither the ownCloud app or Granite but makes more sense in thelatter.

The filecache presents a few issues—the isUpdated() checkrelies on the file modification dates being greater than the cacheddate, if so, the filecache fetches the directory content again. Whenthe repository is rolled-back, the modification dates are inevitablyless than the cached date, leading to an incorrect file listing onthe ownCloud front-end.

A possible solution exists with versioned property stored inthe filecache. The final implementation included a filecache patch,forcing it to refresh the cache for versioned folders, regardless ofthe result of isUpdated(). A better solution would be to simplydelete the cached folders with OC_FileCache when the HEADpointer is modified, forcing an update on the next page load.

Finally, longer-term goals include writing to packfile as detailedin the requirements, and a basic implementation of push and pull,by looking at the Ruby Grit library and the Git source code. Writesupport for packfiles will require the consideration of algorithms

60

Page 69: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

future work 61

for sensible delta choice, which base objects enable the mostefficient diffs, etc. Push and pull appear to be relativey simple,but depend on being able to write to packfiles. This is a lowpriority in consideration of other goals.

With the implementation of push and pull support, and byavoiding rebasing wherever possible, it may be possible to linkrepositories between onwCloud instances, permitting synchroni-sation, backup and mirroring. There are plans for collaborativeediting in the ownCloud text editor, with real-time editing up-dates across multiple clients. By simply synchronising commitevents (to avoid merging, if possible) this real-time editor canadd versioning support.

Page 70: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

11C O N C L U S I O N

11.1 planning and development

The planning and development for this project were very ad-hoc, and deliberately kept ’loose’ to cope with the changes thatcome with open-source development projects. This proved to beextremely useful following the release of ownCloud 3, where asignificant amount of time had to be devoted to understandingthe fileache implementation. Areas which did not previouslyimpact the fileystem suddenly became relevant—this was com-pletely unanticipated and significantly extended the developmenttime.

The original plan presented in the Progress Report was severelyflawed: it failed to take account of the changing nature of open-source software, it underestimated the complexity of readingpackfile. A preliminary requirements list anticipated push andpull support—this was clearly a mistake, as building in supportwould require the implementation of a lot more Git functionalitythan was achieved. In future, care would be taken to shortenthe development sprints even further, with more regular contactbetween the community.

Ultimately though, the iterative nature of development pro-vided early feedback on the critical areas of development, namelypackfiles. This necessitated a longer development sprint for re-leases with packfile-related features, but has resulted in a muchdeeper understanding of the packfile format. I am confident Ican implement write features for packfiles following submission,following the extended investigation into their structure.

11.2 progress and implementation

Twelve of the twenty-two features presented in Section 5.1 wereimplemented, with enough functionality to create a prototypeapp for ownCloud. The ownCloud implementation achieved six

62

Page 71: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

11.3 final words 63

out of seven requirements, although improvements to filecachecompatibility are necessary.

The design of the system evolved over several iterations, inan ad-hoc manner according to changes in both the ownCloudsoftware and my understanding of Git in general. It would nothave been possible to develop the project in a traditional waterfallprocess, although the process used was far from ideal.

The ownCloud code could use heavy refactoring, and the treesearch methods are currently duplicated across three areas ofcode. Refactoring the tree search into the Repository classwould significantly reduce code duplication. Coding standardswere not enforced, and some APIs use varying conventions evenacross Granite. The architecture has shown to be suitable foran initial prototype and implementation, but would need someadjustment for improved performance and functionality.

Overall, however, I believe this project has developed a viable,functional prototype for further use in ownCloud. The codeaccepted into the main repository can be seen as validation ofthis, although the release implementation will likely be verydifferent to its current state. The unit tests for Granite providedreasonable feedback and informed development throughout. Theproject repository history shows early commits had 100% testcoverage—in future such high coverage should be maintained,rather than the 70% displayed in Chapter 8.

Although it is not a reasonable indicator of quality or func-tionality, I am proud of what I have achieved with this project.With refactoring, Granitehas the potential to develop into a lean,fast Git library for a variety of uses. The ownCloud implemen-tation, as discussed in Chapter 10, has been anticipated andwell-received and will hopefully make the core release for own-Cloud 4. The ownCloud project reportedly has 400,000 users ofthe community edition, with two commercial offerings launchedin the last month. This will undoubtedly result in the filing ofmore bugs.

11.3 final words

The internals of Git cross various areas of computer science, fromdelta compression, diffs, patches and binary formats, through tograph theory, binary search trees and more. The experience oftying each of these into a usable, functional project has strength-ened my knowledge base and confidence immensely. I feel thatI could easily implement a similar library and application foranother version control system, possibly Mercurial, given enoughtime.

Page 72: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

11.3 final words 64

This project would not have been possible without the excellent,freely available community documentation on Git, or the imple-mentations in a variety of languages. Numerous useful researchpapers and studies on version control, software configurationmanagement, and Git and Mercurial in particular, were instru-mental in furthering my understanding of Git’s operation for theduration of this project. “Standing on the shoulders of giants”is a truly apt description of both open-source development andacademic research.

Page 73: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

AOC_FILESTORAGE LISTING

Listing A.1: ownCloud OC_Filetorage abstract class

<?php/*** ownCloud** @author Frank Karlitschek* @copyright 2010 Frank Karlitschek [email protected]** ...** You should have receie a copy of the GNU Affero* General Public Liese along with this library. If* not, see <http://www.gnu.org/licenses/>.*/

/*** Privde a common interface to all different storage

options*/abstract class OC_Filestorage{

public function __construct($parameters) {}abstract public function mkdir($path);abstract public function rmdir($path);abstract public function opendir($path);abstract public function is_dir($path);abstract public function is_file$path);abstract public function stat($path);abstract public function filetype($path);abstract public function filesize($path);abstract public function is_readable($path);abstract public function is_writable($path);abstract public function fileexists($path);abstract public function filetime($path);abstract public function filetime($path);abstract public function file_get_contents($path);abstract public function file_put_contents($path, $data

);abstract public function unlink($path);abstract public function rename($path1, $path2);abstract public function copy($path1, $path2);abstract public function fopen($path, $mode);abstract public function getMimetype($path);abstract public function hash($type, $path, $raw);abstract public function free_space($path);abstract public function search($query);abstract public function touch($path, $mtime= null);abstract public function getLocalFile($path);

}

65

Page 74: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

BP H P S T R E A M W R A P P E R C L A S S

Listing B.1: PHP StreamWrapper class prototype from the PHP manual

<?phpstreamWrapper {/* Properties */public resource $context ;

/* Methods */__construct ( void )__destruct ( void )public bool dir_closedir ( void )public bool dir_opendir ( string $path , int $options )public string dir_readdir ( void )public bool dir_rewinddir ( void )public bool mkdir ( string $path , int $mode , int

$options )public bool rename ( string $path_from , string

$path_to )public bool rmdir ( string $path , int $options )public resource stream_cast ( int $cast_as )public void stream_close ( void )public bool stream_eof ( void )public bool stream_flush ( void )public bool stream_lock ( mode $operation )public bool stream_metadata ( int $path , int $option ,

int $var )public bool stream_open ( string $path , string $mode ,

int $options , string &$opened_path )public string stream_read ( int $count )public bool stream_seek ( int $offset , int $whence =

SEEK_SET )public bool stream_set_option ( int $option , int $arg1

, int $arg2 )public array stream_stat ( void )public int stream_tell ( void )public bool stream_truncate ( int $new_sie)public int stream_wrie( string $data )public bool unlink ( string $path )public array url_stat ( string $path , int $flags )

}

66

Page 75: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

A N N O TAT E D B I B L I O G R A P H Y

[1] Bazaar Community, “Bazaar developer documentation—bazaar architectural overview,” Internet: http://doc.bazaar.canonical.com/bzr.dev/developers/overview.html, (Last ac-cessed: 24 Apr. 2012).

Describes the object and storage model for Bazaar,without explicitly defining on-disk formatting. Asmentioned earlier, this is more of a description ofthe Bazaar API, which was useful for Chapter 2,but unfortunately rather vague.

[2] ——, “Bazaar developer documentation—revision prop-erties,” Internet: http://doc.bazaar.canonical.com/developers/revision-properties.html, (Last accessed:24 Apr. 2012).

Describes revision properties, which allow 3rd-party apps to store additional information witheach revision (or commit), in Bazaar. These wereconsidered as potential metadata storage shouldBazaar be selected for use in ownCloud.

[3] ——, “Bazaar faq,” Internet: http://wiki.bazaar.canonical.com/FAQ, (Last accessed: 24 Apr. 2012).

Lists supported transport protocols for Bazaar,amongst other frequently asked questions.

[4] ——, “Bazaar workflows,” Internet: http://wiki.bazaar.canonical.com/Workflows, (Last accessed: 24 Apr. 2012).

Provides a detailed discussion, with examples, ofhow Bazaar can be used to enforce virtually anyworkflow in a software development team. This re-quired significant investment to understand prop-erly, as most development workflows are enforcedin some way by the VCS in use, making Bazaarrather unique.

[5] ——, “Core concepts—bazaar user guide,” Internet:http://doc.bazaar.canonical.com/latest/en/user-guide/core_concepts.html, (Last accessed: 24 Apr. 2012).

Provides an overview of the logical Bazaar objectsand terminology, and forms an introduction to therest of the Bazaar documentation.

67

Page 76: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

annotated bibliography 68

[6] K. Beck, Test-driven development: by example. Addison-WesleyProfessional, 2003.

The canonical book on TDD guided the develop-ment of unit tests, helped shape the idea of what a“unit” is, and provided general help and informa-tion essential in the planning stage of the project.

[7] R. S. Brand, “Git data formats,” 2006, (Last accessed: 24 Apr.2012).

Describes in detail integer and delta encodingsin the Git packfiles and associated indexes. Notfound until later on in the project, this docu-ment was particularly useful in understanding thevariable-length encoding used for object sizes inpackfiles—documentation elsewhere was sparse.

[8] Canonical Ltd., “Bazaar migration docs—why switchto bazaar?” Internet: http://doc.bazaar.canonical.com/migration/en/why-switch-to-bazaar.html, (Last accessed:24 Apr. 2012).

Describes centralised workflow options and otherfeatures of Bazaar, from an admittedly biased per-spective. Alternative points of view are linked toin the document, and together formed a generalview of the advantages of each system over theother.

[9] S. Chacon, “The git community book,” Internet: http://book.git-scm.com, (Last accessed: 24 Apr. 2012).

The community book covers the basics of Git us-age, advanced branching and merging, and theinternal Git “plumbing”. The plumbing describesthe internal storage formats, transfer protocolsand underlying algorithms used in Git in excel-lent detail.

[10] J. Estublier, D. Leblang, A. Hoek, R. Conradi, G. Clemm,W. Tichy, and D. Wiborg-Weber, “Impact of software engi-neering research on the practice of software configurationmanagement,” in ACM Transactions on Software Engineeringand Methodology (TOSEM), vol. 14, no. 4. ACM, 2005, pp.383–430.

Describes and evaluates the role of research in Soft-ware Configuration Management in influencingmodern version control software.

[11] D. Grune, “Concurrent versions system, a method for inde-pendent cooperation,” IR 113, Vrije Universiteit, Tech. Rep.,1986.

Page 77: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

annotated bibliography 69

As the first client-server VCS, the original CVS pro-vided valuable background reading and an in-troduction to common terminology. This paperhelped determine the content of Section 2.1.4, andprovided important background reading.

[12] O. Jakobsen, “Integrating libpesto with subversion,” Mas-ter’s thesis, Universitetet i Tromsø, 2007.

A useful introduction to version control and as-sociated terminology is presented in Section 2 ofthis Master’s Thesis, from both a centralised anddecentralised point of view.

[13] D. Knittl-Frank, “Analysis and comparison of distributedversion control systems.”

A details consideration of the major version con-trol systems at the time, particularly Bazaar, Git,and Mercurial. Was heavily influential in the for-mat and structure of Chapter 2, and providedan excellent technical introduction to Git for thisproject.

[14] M. Mackall and Selenic Consulting, “Towards a better scm:Revlog and mercurial,” in Linux Symposium, 2006, p. 83.

Provides a detailed description and discussion ofthe Mercurial Revlog format, with attention to per-formance compared to Git, and big-Oh analysis ofa Revlog’s performance. Goes into implementationdetail and encoding formats used on-disk.

[15] Mercurial Community, “Mercurial wiki–branches,” Internet:http://mercurial.selenic.com/wiki/Branch, (Last accessed:24 Apr. 2012).

Describes branches, which are diverging lines ofdevelopment, and how they are implemented as aseries of linear changesets. Also describes "namedbranches", which might be more accurately calledtags and apply to individual changesets.

[16] ——, “Mercurial wiki–changeset,” Internet: http://mercurial.selenic.com/wiki/ChangeSet, (Last accessed:24 Apr. 2012).

Describes changesets, which are used to storerepository changes. Effectively the same as Gitcommit objects.

[17] ——, “Mercurial wiki–nodeids,” Internet: http://mercurial.selenic.com/wiki/Nodeid, (Last accessed: 24 Apr. 2012).

Page 78: Integrating Version Control Into ownCloud · The KDE ownCloud project is an open-source cloud storage project written exclusively in PHP. The ownCloud software sup-ports third-party

annotated bibliography 70

Describes how nodeids are generated based onprevious commit history, resulting in an im-mutable history.

[18] M. Pool, “Bazaar becomes a gnu project,” Inter-net: http://lists.gnu.org/archive/html/info-gnu/2008-05/msg00012.html, May 2008, (Last accessed: 24 Apr. 2012).

Mailing list archive announcing the inclusion ofBazaar into the GNU project

[19] E. S. Raymond, “Understanding version-control systems(draft),” Internet: http://www.catb.org/~esr/writings/version-control/version-control.html, (Last accessed: 24 Apr.2012).

Eric S. Raymond provides a detailed overview ofthe history of version control systems as well as acomprehensive review of the core algorithms anddata structures used in each. This document coversa large number of arguments for and against eachparticular system, with the “Comparisons” sectionbeing the most useful and relevant.

[20] M. J. Rochkind, “The source code control system,” IEEETransactions on Software Engineering, vol. SE-1, no. 4, pp. 364–370, December 1975.

Original SCCS paper describing the the first ver-sion control system. Very useful for an in-depthdescription of delta compression based on lineinsertions and deletions. Forms the foundation ofmodern delta compression, diffs and patch for-mats.

[21] diffen.com, “Git (software) vs mercurial (software),” In-ternet: http://www.diffen.com/difference/Git_(software)_vs_Mercurial_(software), (Last accessed: 24 Apr. 2012).

Lists various statistics comparing Git and Mercu-rial, the majority of which are functionally identi-cal.