technische universiteit eindhoven technische informatica

109
Technische Universiteit Eindhoven Technische Informatica Master of Science Thesis CHI Explorer building a community by G.A.R.M. Boshouwers Supervisors: prof. dr. ir. G.J.P.M. Houben ir. K.A.M. van der Sluijs drs. F.F.M. Ector Eindhoven, 2008

Upload: others

Post on 03-Feb-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Technische Universiteit Eindhoven Technische Informatica

Technische Universiteit Eindhoven

Technische Informatica

Master of Science Thesis

CHI Explorer

building a community

by G.A.R.M. Boshouwers

Supervisors:

prof. dr. ir. G.J.P.M. Houben

ir. K.A.M. van der Sluijs

drs. F.F.M. Ector

Eindhoven, 2008

Page 2: Technische Universiteit Eindhoven Technische Informatica
Page 3: Technische Universiteit Eindhoven Technische Informatica

Contents

Contents i

1 Introduction 11.1 RHCe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 CHI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Specification 52.1 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Metadata Scarcity . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Manual Addition . . . . . . . . . . . . . . . . . . . . . . . . 82.2.2 Automatic Addition . . . . . . . . . . . . . . . . . . . . . . 82.2.3 Human-based Computation . . . . . . . . . . . . . . . . . . 9

2.3 Video Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.1 Video Encoding . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.2 Video Fragments . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Design 133.1 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.2 Bookmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Video Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.1 Codecs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.2 Video Containers . . . . . . . . . . . . . . . . . . . . . . . . 203.2.3 Video Streaming . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3.1 User Management . . . . . . . . . . . . . . . . . . . . . . . 22

i

Page 4: Technische Universiteit Eindhoven Technische Informatica

ii CONTENTS

3.3.2 Search Results . . . . . . . . . . . . . . . . . . . . . . . . . 223.3.3 Detail View . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3.4 Tag Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Implementation 314.1 Joomla! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . 324.1.2 Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.1.3 Simple Machines Forum . . . . . . . . . . . . . . . . . . . . 36

4.2 Data Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.1 Sesame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.2 Location Coordinates . . . . . . . . . . . . . . . . . . . . . 454.2.3 W3C Time Ontology . . . . . . . . . . . . . . . . . . . . . . 46

4.3 PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3.1 Joomla! function calls . . . . . . . . . . . . . . . . . . . . . 484.3.2 SIMILE Timeline . . . . . . . . . . . . . . . . . . . . . . . . 534.3.3 Google Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 544.3.4 Result Clustering . . . . . . . . . . . . . . . . . . . . . . . . 554.3.5 VideoLan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.3.6 Query Encapsulation . . . . . . . . . . . . . . . . . . . . . . 584.3.7 Tag Representation . . . . . . . . . . . . . . . . . . . . . . . 644.3.8 TagSuggestion component . . . . . . . . . . . . . . . . . . . 654.3.9 Forum Links . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5 Use Case 695.1 Search request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.2 Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.3 Moderating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6 Conclusion 816.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.1.1 Disclose video . . . . . . . . . . . . . . . . . . . . . . . . . . 816.1.2 Data integration and expansion possibilities . . . . . . . . . 826.1.3 metadata generation . . . . . . . . . . . . . . . . . . . . . . 82

6.2 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826.2.1 Multi-user test . . . . . . . . . . . . . . . . . . . . . . . . . 836.2.2 Weight differences in tag proposals . . . . . . . . . . . . . . 83

6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846.3.1 Moderator actions . . . . . . . . . . . . . . . . . . . . . . . 846.3.2 Adding tag dimensions and object types . . . . . . . . . . . 856.3.3 Search options . . . . . . . . . . . . . . . . . . . . . . . . . 85

Page 5: Technische Universiteit Eindhoven Technische Informatica

CONTENTS iii

Bibliography 87

A Code Snippets 91A.1 Google Maps Location Collection . . . . . . . . . . . . . . . . . . . 91A.2 W3C Time Ontology rewrite . . . . . . . . . . . . . . . . . . . . . 93A.3 VideoLan Server Script . . . . . . . . . . . . . . . . . . . . . . . . 99

List of Symbols and Abbreviations 101

List of Figures 102

List of Tables 103

Page 6: Technische Universiteit Eindhoven Technische Informatica
Page 7: Technische Universiteit Eindhoven Technische Informatica

Chapter 1

Introduction

1.1 RHCe

RHCe (Regionaal Historisch Centrum Eindhoven) started out as an organizationwhich stores archives of the municipalities and other governmental instances inthe region of Eindhoven. This is done to allow citizens to check up on the decisionsof a county or retrieve genealogical information about someone’s relatives. In lateryears, this collection was extended with newspapers, pictures, maps and videos,to give a more complete overview of the time.

All this information is available to people and organizations in the office inEindhoven, if allowed by copyright laws. RHCe wants to stimulate the use ofthese archives as much as possible, by making it easier to search these amountsof data. In order to do this, some parts have been made available using theAquabrowser1 at the public library of Eindhoven2.

Only a very small part of the collection is searchable using this Aquabrowser.RHCe does open its doors to interested people. The office in Eindhoven containsa reading-room where people can browse the collection under accompaniment ofsupervisors, as long as those sources are not protected by copyright laws and arein a good condition. Also, if someone else has already checked an item out, otherpeople will have to wait.

Both the problem of the degradation of the physical condition of an item andthe problem of multiple checkouts can be solved with digital copies. Anotheradvantage of using digital copies is the ability to publish this information onthe Internet. This could potentially reach thousands of interested persons, whohave not heard about RHCe before. For this reason a web site3 was launched,

1http://www.medialab.nl/2http://aqua.obeindhoven.nl/3http://www.rhc-eindhoven.nl/

1

Page 8: Technische Universiteit Eindhoven Technische Informatica

2 CHAPTER 1. INTRODUCTION

containing some basic information about the region, along with some articleswritten to draw in people to the office proper.

But this leaves out some interested people who can not visit the office for somereason. They might be only mildly interested and living in another part of thecountry. These people might think the trip too far for a quick glance of a photoor two. Another group that could be reached are families who have their roots inthis region, but have emigrated since. The children of these people might wantto know more about their family history, but a trip to another country could wellbe too much to ask.

For these target groups, as well as to facilitate other users, a proposal wasmade to extend the web site to include a greater part of, if not the completecollection. This should be executed in such a way, that the collection is easilyaccessible to users. It should not only display objects in a reasonably completeway, it should also help users to find objects they are especially interested in. Forthis reason the CHI project was started.

1.2 CHI

CHI (Cultureel Historische Informatie) is a prototype browser, designed to dis-close the images in the RHCe collection. This program has been developed in co-operation with the TU/e (Technische Universiteit Eindhoven), by G.H.J. Dorssersand F.T.M. Kamzol.

The main focus of this project was the ability to search the collection forrelevant items. This meant the inclusion of some descriptive metadata, definingwhat was depicted in each image. This metadata was readily available at theRHCe, which made it possible to focus the attention of this project on the datamodel. This in turn allowed the developers to implement an effective searchalgorithm and interface for this system.

CHI consists of a PHP interface around a semantic database, linking thepictures to semantic concepts such as dates, locations and terms. This is used toenable users to access these archives via the Internet. Searching this dataset isperformed by selecting the concepts that are related to the user’s search queryand finding those pictures that are considered to include those concepts.

These concepts have mutual relations, linking some of them together. Thisway objects can be found containing related concepts. An example of this wouldbe the search term vehicle, which would also return objects that are only taggedwith the term car. In this case the system has correctly deduced that a car isalso a vehicle, which might interest the user.

To give the user some insight in the concept structure used in the system,they are also given the option to browse through this using a search cloud. Thisview uses a graph to display all related concepts of the current option. Concept

Page 9: Technische Universiteit Eindhoven Technische Informatica

1.3. VIDEO 3

and relation types are color-coded to differentiate them. By clicking on theserelated concepts, the view shifts to that particular concept. By double-clicking, aconcept can be added to the search query. As the focus was put more on searchingthe dataset, the result page was kept simple. It consisted of a list of objects, eachconsisting of a thumbnail image and a summary of its descriptive metadata.

The project was considered a success by RHCe, and there was considerableinterest to extend the project to other data formats. The prime focus here wouldbe the addition of videos, since this format is increasing in importance today. Italso has several distinct problems, which will be handled in the next section.

1.3 Video

The next step in disclosing the RHCe archives is the inclusion of other dataformats next to images, such as video. This format is chosen because a lot of newobjects are currently coming in as videos, while not a lot of metadata is availableabout them. To simplify making search queries for users, search results shouldconsist of both pictures and videos relevant to the given query.

Videos usually contain more information than a picture. To be able to accessthis information, the videos can also be searched on the basis of parts of video,called bookmarks. These can symbolize scenes or occurrences in the video, whichhave a more specialized meaning within the video.

A problem that surfaces here is the lack of metadata that is currently knownconcerning these sources. Due to the recency of this data format, many objectsare not yet analyzed by RHCe. Also, one video can contain a lot of differentscenes, which all can have a different subject. Lastly, due to the importance oftelevision en the accessibility to video cameras in the current age, this collectionwill grow exponentially in the future. All in all this will make it very difficult tokeep metadata of this data format up to date.

The CHI explorer project consists of three parts:

• Disclose RHCe videos to the users, allowing them to search through thedataset.

• Integrate other data formats; allow users to find images and videos usingthe same search query and allow for the inclusion of other data formats ata later date.

• Implement a system to generate new metadata for existing objects.

In chapter 2 we will explore in depth which problems need to be solved toimplement a system such as this. The design of the actual data model andinterfaces will be explained in chapter 3. Chapter 4 contains the implementationdecisions that have been made, including all external programs that have been

Page 10: Technische Universiteit Eindhoven Technische Informatica

4 CHAPTER 1. INTRODUCTION

used. In Chapter 5 a use case will be explained to give an overview how thesystem operates. Finally, chapter 6 contains the conclusion as well as possibleextensions that could be added to the system.

1.4 Related Work

CHI, the predecessor of this project, contains a sophisticated system to handlesearch queries over a set of images. This can be extended for videos with someadditions to the data model. Though several different types of terms are used inthe search algorithm, the user has no way to differentiate returned objects usingthese types. Different views on the result set could improve a user’s understandingof these objects.

The Aquabrowser used to disclose part of the RHCe collection has an ex-tended search engine to find objects using subject tags. This ignores other searchdimensions, such as the period in which an object is made, or the location theobject is about. By not being able to search in these dimensions, it is a lot moredifficult for a user to find information about a specific occurrence.

The website of Beeld en Geluid4, an archive for Dutch television and radiobroadcasts, has two systems to search their collection. One is a tag-based searchsystem, where users can find which videos are present and can order these videos,which can then be picked up at their department in Hilversum. The other is analphabetically sorted list of current programs, which can be viewed online. Thissecond system can not be searched. To make searching easier, these two systemscould be merged, so users can search for relevant items. When these items arefound, a direct link to these sources could enable a user to value these sourcesmore easily. In this system, the use of bookmarks should also enable the users todirectly jump to a certain fragment within the video.

4http://portal.beeldengeluid.nl/

Page 11: Technische Universiteit Eindhoven Technische Informatica

Chapter 2

Specification

This chapter will explore the problems that occur in the execution of this project.The main obstacles that are faced in this project have been mentioned in section1.3, but we will repeat them here:

• Disclose RHCe videos to the users, allowing them to search through thedataset.

• Integrate other data formats; allow users to find images and videos usingthe same search query and allow for the inclusion of other data formats ata later date.

• Implement a system to generate new metadata for existing objects.

Section 2.3 explains how video can be disclosed over the Internet or other net-works. Section 2.1 specifies the necessary metadata fields that we need to be ableto search the collection. Section 2.2 notes how descriptive information can begained for these videos.

2.1 Metadata

To be able to search multiple data sources and data types, we need to definethose fields that contain relevant information on all objects. If we look at thespecific data sources of images and video, there are several possibilities. Mostend users will only be interested in the actual scenes that are depicted in thesesources, though. I will focus on descriptive information of these sources.

Both images and videos contain a depiction of a certain scene. This can rangefrom a landscape where nothing much happens, to a snapshot of a war scene. Butno matter what is depicted, a human can normally describe the scene in a couple

5

Page 12: Technische Universiteit Eindhoven Technische Informatica

6 CHAPTER 2. SPECIFICATION

of sentences. What the system needs to do is search through this description andfind matching objects to the search query given by the user.

2.1.1 Tags

A scene is usually described by a person using a short text. This works ex-ceptionally well among humans, but computers at the moment lack the abilityto interpret these texts. A computer needs to find clear structure in the datafield(s) to be able to make sense of it. To enable a computer to do this, we needto transform the written description in to a structured number of terms. Thisway a computer can reason about this dataset.

These terms can be gathered by finding the key phrases in the descriptive textand reducing them to a single word. These words can then be attached to theobject, making them tags for this object. Tags are related to certain concepts,and these concepts can have mutual relations. For example both the tags car andautomobile could point to the same concept car. This concept car has a relationwith the concept vehicle, in that each object which belongs to the concept car

also belongs to the concept vehicle.In this system each object will include a list of tags. These tags will be linked

to a certain concept. Each concept can be related to other concepts. If a usersearches for a certain tag, the system will check which concepts can include thistag and which concept could be related. Then it checks which objects are linkedwith these concepts and returns these to the user.

Concepts belonging to tags such as car and computer do not have any rigidinternal structure and can be related using a graph system, where each noderepresents a concept and each edge represents a relation. Two tag types can bemore strictly structured. These types will be explained in more detail in sectionsDate and Location.

Date

All objects described in the RHCe dataset are snapshots of some part of history.This point in time can be specified and then later be used by the search algorithmto search for specific eras. For enabling users to search this historical dataset, wewill use days as the smallest unit of measurement.

There are three possibilities in which this field can be filled in. Some objectscan be labeled with a specific date, in which case this data field can contain asimple date in the form of a year, a month and a day notation, such as 12-31-2007.

In other cases, a scene can describe a longer period. This can be especiallycommon in videos, which could describe the happenings of a week. In this casewe want to record this as a period starting at and ending on a certain date. Bothof these dates can be entered in the same way as a single date would be.

Page 13: Technische Universiteit Eindhoven Technische Informatica

2.2. METADATA SCARCITY 7

The last case is used when specific information is not (yet) known. Thishappens more often in older sources, where pinning down the correct date is verydifficult. In this case we want to be able to enter an incomplete date, such as1940 or September 1942. This at least gives a general indication of the period ofthis object.

Date tags have some simple predefined relations; they are all placed in acertain part of time. This means, that if a user searches for objects in a particularperiod, we should be able to find all objects that fall within this period. We cancalculate this, so these relations do not have to be defined in the data model.

Location

All objects in the RHCe collection also depict a certain place. This place can bespecified, using a geographic label. This could be the name of a city or a specificaddress. This collection of labels is highly ordered, since addresses belong tostreets, which in turn belong to cities, and so on. This hierarchic structure canbe used in search queries by adding all child locations of a search term to thatquery.

The possible location labels and their hierarchy are fairly stable. For example,if we consider the TU/e, we can define its location as Den Dolech 2, which is anaddress in Eindhoven, a city in the province of Noord-Brabant in the Netherlands.This hierarchy has little chance of changing at the street and city level, and willprobably never change at a higher level.

This stability means that we could potentially fit a location into the hierarchyautomatically, given a fairly recent model of the situation. New data can beentered faster in this manner.

The RHCe collection of images contains enough metadata to extract thesevalues. Due to the more recent boom of videos in both private and public use,such metadata is not yet available for these sources. These need to be addedbefore any search queries can return relevant objects.

2.2 Metadata Scarcity

The CHI system needs a lot of metadata about its objects to be able to obtaingood search results. Although a lot of information is available to the system whenit comes to images, we cannot say the same about the videos. In this case wefrequently only know the title of an item, without any information in regards tocontent.

A part of this new project should revolve around collecting this kind of in-formation, without which we cannot generate correct search results. There areseveral ways to perform this metadata acquisition. We will check three ways todo this in depth in the next sections.

Page 14: Technische Universiteit Eindhoven Technische Informatica

8 CHAPTER 2. SPECIFICATION

2.2.1 Manual Addition

The standard way to collect this information is for the employees of RHCe tocheck the entire dataset manually and input relevant metadata to these items inthe Atlantis database.

This is a very cumbersome process, which takes a lot of time. A specialistneeds to analyze the object completely and add a description or some tags to it.This metadata should be as succinct as possible, without losing any importantinformation.

Analyzing an image can take some time to identify the scene depicted. Ana-lyzing an entire video, which could have a playing time of up to two hours, takesthat much longer. Add to that the fact that the number of videos offered toRHCe has been increasing since the rise of both local television networks andhome video cameras.

Another problem are niche sources. While RHCe has employees who areknowledgeable about many places in Eindhoven, there are also videos in thecollection about villages and hamlets surrounding Eindhoven. It is much harderto get a specialist about such a small location. Another example can be videosdepicting a specific hobby or occupation, using terms and jargon that are notknown to the employees.

These factors would imply that this job would never be done by the currentstaff, or that this staff would need a massive expansion. Neither of these casesare an option, so other methods need to be found to solve this problem.

2.2.2 Automatic Addition

Some computer systems seem to understand what is depicted on an image. Awell known example of this is Google Images1. This search engine returns imagesfrom the Internet that it thinks are related to the search query. However, thissystem does not actually analyze the returned images. Instead, it analyzes thecaption, file name and text around an image. According to this information, itdecides if an image is relevant. Since we are primarily interested in tags at thisstage, this system could not work.

There are several systems that do analyze video or images. An example ofthis is MuNCH2 (Multimedia analysis for Cultural Heritage), another project inthe CATCH program. Here some good progress is made with relation to theautomatic recognition of objects in an image or video.

The weakness of the system lies in the fact that it can only recognize objectsthat it is trained to find. Due to the growing number of videos in the RHCecollection, the system probably will not be able to learn all necessary tags to

1www.images.google.com2http://ilps.science.uva.nl/munch/

Page 15: Technische Universiteit Eindhoven Technische Informatica

2.2. METADATA SCARCITY 9

keep up in specialized cases. For example, it is able to recognize a dog in apicture, but it cannot differentiate a Great Dane from a Chihuahua, withoutbeing especially trained for it. This also exludes the niche sources mentioned inthe previous section.

With current technology, only a human user has the flexibility to correctlylabel several types of objects to varying degrees of detail or, if they do not knowthe answer, find someone who is able to do this. It would be useful to allow thesystem to be extended with automatic capabilities in the future.

Since it is impractical to employ enough specialists to add this information,another way needs to be found. For example, information could be obtained fromthe users of the search engine themselves.

2.2.3 Human-based Computation

The idea behind Human-based Computation is the delegation of tasks that aredifficult or time-consuming for a computer to the user. This can be done in theform of a game. An example of this is the ESP game3, designed by Luis vonAhn. In this game, two players, without any means of communication, are shownthe same picture. They can then type in words describing the picture. If bothplayers have typed the same word in a period of time, they are awarded pointsand move on to the next picture. The goal of the game is to get as many pointsas possible.

A side effect of this game is that the words that two people have agreed uponcould be considered a label for that picture. If multiple player pairs use the samelabel for the same picture, it could be a good tag for that picture. This way alot of relevant tags can be generated for a lot of pictures. People tend to returnto play this game due to the competitive aspect and the honor of being listed inthe top rankings on the site.

Another way to implement Human-based Computation is allowing every userto change your web site. This system is called a Wiki, with Wikipedia4 as itsmost famous example. Wikipedia is an encyclopedia that anyone can add to orexpand upon. It currently has over 2 million articles in the english version. Dueto the possibility of user addition, some articles have already been translated indifferent languages. It has articles in more then 250 languages.

The danger of a system like this is misuse. Some people might vandalize thesystem by adding wrong or irrelevant information. Another possibility is thatthe site is not used for its original purpose by focusing on the ”wrong” subjects,according to the creators of the site. This can be solved by only allowing registeredusers to change the site, but this will reduce the amount of information that isgathered.

3http://www.espgame.org/4http://en.wikipedia.org/

Page 16: Technische Universiteit Eindhoven Technische Informatica

10 CHAPTER 2. SPECIFICATION

Our solution needs to be somewhere in between, a system that can get in-formation from any user using simple, structured forms. The value of this in-formation can be increased by actively involving the users in the entire processform archive object to search result. This will not only merge these users into acommunity, but it will also give RHCe access to a great source of information, aswell as an unofficial workforce.

Users can be motivated to add information to the system in several ways.The first and simplest one is a desire to share information. For example, peoplewho have grown up in Eindhoven, might want to tell others about the images andvideos that are made during that time. This can lead to a lot of first-hand, albeitsubjective information. Nobody knows as much about the person in a picture asthe person himself.

Another reason to add information can be a genuine interest in history. Al-though it is hard for employees of RHCe to find solid information about the smallvillages surrounding Eindhoven, there are several historical societies who are moreknowledgeable about one or more locations. Opening a subset of the collection toan organization such as these could solve large gaps in the descriptions of someobjects.

Information that is retrieved using this method does need to be filtered in someway. We cannot assume this information is correct, especially when it originatesfrom anonymous sources. This can be done by using a derivation of the ESPgame: if the same tag has been used by several users at the same object, it isprobably correct. Allowing users to rate someone else’s tags can also minimizesuperfluous information.

Now that we have found a way to store and accrue metadata, we need tothink about the way to digitally display our data sources to the user. This isnot a big problem with images, who can be scanned and displayed without muchproblems. Videos are somewhat more complex to get to a user.

2.3 Video Streaming

Most videos in the RHCe dataset are not stored in a format that allows digitaldistribution. To be able to do this, we need to digitize this information in such away, that this can be shown over the Internet. This means we need to compressthis data to minimize data traffic. We do this by using a video codec and videocontainer that can handle video streaming in an adequate quality.

2.3.1 Video Encoding

Storing a video in a digital form is called encoding. Several terms are being usedin relation to the encoding of video formats:

Page 17: Technische Universiteit Eindhoven Technische Informatica

2.3. VIDEO STREAMING 11

• Frame size: This value determines the absolute size in which all frameswithin a video are stored. To keep a video at the best quality, it shouldbe kept at the same size as its original. A lower frame size usually willsignificantly lower both its file size and its quality.

• Resolution: While frame size relates more to the actual size of a video,resolution controls the number of pixels per square inch. Lowering thisvalue will lower the size of a video, but it will also make a video moreout of focus. A very low figure will lead to a ’pixelated’ video, where youcan see the individual pixels that make up this video as moderately largesquares. This will cause lines to become jagged, which can make the videounappealing at best.

• Frame rate: This value determines the number of frames that are shown ina video per second. Lowering this value is a standard way to end up with asmaller file size, but will often result in jolting movements displayed in thevideo. Its standard value is about 23 frames per second.

• Video container: This is not so much a value as a protocol in which the videois stored. The best known containers are Microsoft Audio-Video Interleave,Apple QuickTime and Real Media, which will be discussed later. A Videocontainer is a collection of data streams, which can include video, audioand text.

• Compression: This involves coding the video stream in such a way, that iteliminates redundant or unimportant data and thereby lowers the file size.Programs that do this are called codecs (COder/DECoder) and come intwo types: lossless (without losing any information) and lossy (losing someinformation in favor of smaller file sizes). Compression rates depend onthe codec used and the video to code, but it can climb to 200:1 or more.Codecs, especially lossy ones, are mostly used to minimize data traffic whenstreaming videos.

Before a video can be encoded, it is split in several input streams. Usually,these only consist of one video and one audio stream, but other streams couldinclude subtitles, extra audio streams (e.g. for different languages or commentarytracks), other video streams (e.g. different languages), transport stream (to allowmultiplexing of digital video and audio and to synchronize the output), etc. Thesestreams are then each encoded with their own codec and then combined in a videocontainer.

2.3.2 Video Fragments

Video can have a higher information density than an image. While an image isa snapshot of a certain scene, a video is usually a combination of several scenes.

Page 18: Technische Universiteit Eindhoven Technische Informatica

12 CHAPTER 2. SPECIFICATION

These scenes can be shot at several places and in several periods.For example, if we would analyze a video of a newscast, we will find shots

made at different parts of the world. We could also see some archive shots fromhistorical events. Because of this amalgamation of subjects, it is very possiblethat tags will only be relevant for small parts of the video.

To ensure that a user does not have to sit through half an hour of video fora minute of relevant data, we will have to make it possible to find fragments ofvideo. This means that we need to allow users to skip forward and backwardthrough a video. The streaming technique that allows this is called Video onDemand (VoD).

We also need to be able to start these videos automatically. This is necessarywhen a user found a fragment of a video in a search query and wants to play thatfragment. In this case the system needs to load the correct video and start it atthe right time.

Now that we have specified the requirements that the system will need, wewill review the design of the data model and the interface in chapter 3.

Page 19: Technische Universiteit Eindhoven Technische Informatica

Chapter 3

Design

In this chapter we first discuss the design of the data model that is used to includevideos in CHI Explorer in section 3.1. The next step involves the technical needsto stream these videos to the users, which we discuss in section 3.2. Lastly, section3.3 explains the decisions that have been made in regard to the user interface.

3.1 Data Structure

To be able to search multiple data sources and data types, we need a data struc-ture to represent the information of all data types. The main focus of this projectconsists of videos, but the resulting data structure should be adaptable to otherdata sources. As shown in section 2.3.2, we can use fragments to solve the in-formation density of a video. The following data structure should be able to tagvideos on a video level as well as on a fragment level.

In the next sections I will explain the fields that this system will need. Someof these options are already implemented in CHI, but need to be generalized forvideo usage or updated to current standards. An example of this is the definitionof date concepts, as explained in section 4.2.3. Other options are geared moretowards video metadata and have been added to the new system. This includesthe definition of the Fragment and Comment concepts in sections 3.1.2 and 3.1.3respectively.

3.1.1 Tags

The implementation of CHI focused on the ability to search and find specificsources in the RHCe collection. To enable this, a specific ontology has beencreated.[3] [4] CHI Explorer uses this same data model to tag objects and searchthe collection using the relations between concepts belonging to these tags.

13

Page 20: Technische Universiteit Eindhoven Technische Informatica

14 CHAPTER 3. DESIGN

CHI uses the GABOS data structure [3, p.16], with the inclusion of a systemof tags, [3, p.35]. The GABOS system links these tags to certain concepts, whichin turn can be related to each other. These are the type of relations we describedin section 2.1.1. This system was initially built to work with images alone, butas we will show here, can be adapted to work with videos.

CHI Explorer should enable users to search both image and video collectionsto find objects relevant to a certain subject. Both images and video can bedescribed in the same way, using the relations as shown in section 2.1.1. The CHIsystem defined all tag relations as pertaining to the ArchiefObject object, whichis a parent object of both the Foto as Video objects in the GABOS ontology. Thisdata structure therefore allows us to tag videos in the same way as the images.

As we have mentioned in section 2.3.2, videos can have a higher informa-tion density. Although the GABOS system allows the tagging of complete datasources, it does not allow the tagging of fragments thereof. The next section willintroduce a new object to be able to tag these fragments.

3.1.2 Bookmarks

All objects in the GABOS ontology are seen as isolated subjects when taggedusing the CHI ontology. This works well in the case of images, but videos generallyhave a higher information density, as mentioned in section 2.3.2. A new objectneeds to be created to keep track of these fragments. This new Bookmark objectneeds to be able to include the CHI tag relations. It will expand the originalGABOS data structure, enabling it to tag fragments.

This new object is called Bookmark and has the following properties, as shownin figure 3.1:

• bookmarkID: This is a unique identifier to allow the system to index theobjects. These will be generated automatically in CHI Explorer.

• onderdeelVan: This is a PartOf relation linking the Bookmark to a cer-tain Archiefobject. In the case of CHI Explorer, this will always be a video.

• begintOp: This property marks the start of a fragment using the Tijd

object. The structure of this object will be explained below.

• eindigtOp: This property marks the end of a fragment using the Tijd

object.

• bookmarkBevatTerm: This property is a sibling to the archiefobject-

BevatTerm relation of CHI. It is used to link tags to certain fragments.A Bookmark object can have multiple bookmarkBevatTerm relations todifferent Term objects.

Page 21: Technische Universiteit Eindhoven Technische Informatica

3.1. DATA STRUCTURE 15

Figure 3.1: Data structure of Bookmark and Tijd objects

Bookmark objects use Tijd objects to define the start and end within anArchiefobject. In videos, these values will be times, but in the future, thesecould be generalized. For example, using coordinates to define fragments inimages or using page numbers to define fragments in documents. Since CHIExplorer focuses mostly on videos, the only defined relation is this Tijd object.It has three properties.

• tijdUur: This property defines the number of hours relative to the startof the source.

• tijdMinuut: This property defines the number of minutes relative to thestart of the source. This is a value between 0 and 59. Any higher amountsof time will be stored in tijdUur.

• tijdSeconde: This property defines the number of seconds relative to thestart of the source. This is a value between 0 and 59. Any higher amountsof time will be stored in tijdMinuut.

To generate search results CHI queried the Archiefobject objects. In CHIExplorer, we want to merge images, videos and fragments in the search resultlists. For this reason we created the ListItem object, which we defined as a

Page 22: Technische Universiteit Eindhoven Technische Informatica

16 CHAPTER 3. DESIGN

parent to Archiefobject and Bookmark. This way we can query all ListItem

objects, which we know have links to Term objects.These additions allow CHI Explorer to offer the same options to video as CHI

did to images. The next step is to allow users to tag objects themselves. Thenecessary additions to the data structure are defined in the next section.

3.1.3 Comments

In section 2.2 we decided to involve the users of CHI Explorer to collect metadatafor the video collection. It is important to remember, that data that comes in inthis manner does not necessarily conform to the standards of RHCe. This meanswe will have to measure the relevance of a tag before we incorporate it into thesystem. Since we wanted to use human computation to relieve RHCe employeesof an enormous workload, we cannot let these employees check every entree byhand.

To solve this, CHI Explorer uses a derivation of the system used by Luis vonAhn in the ESP Game1. Here, a tag is only added to the system if two usersagree with it at the same time. This worked well in the context of this game,but our system can not guarantee that two users will be using the system at thesame time. CHI Explorer implements a rating system to emulate this behaviour.It will store the opinions of all users about a certain tag in combination to acertain object. This allows the system to check the relevance of this combinationaccording to the users of CHI Explorer.

Next to tagging objects, users can also rate tags other users have added. Thisrating process is kept simple, allowing users to approve or disprove a certain tagin relation to a certain object. In the future an algorithm could be used thatevaluates these votes, so the system can refine promising tags and display thesetags to the RHCe employees for approval.

To draw users into rating tags, we also designate a place to discuss tags.By allowing users to convince each other, we hope to generate a communityof interested users, which increases the number of added or rated tags. Thiscommunity can later be used to pinpoint especially knowledgeable users, who canbe promoted to a more responsible position. These users can then be given someadministrative rights, which can alleviate maintenance duties for the system. Thereliability of this system increases with the size of the community, when multipleusers are voting on each tag-object combination. As long as such a combinationhas no votes, it can be ignored by moderators and RHCe professionals.

To differentiate between approved tags and user-proposed tags CHI Explorerneeds to append the CHI tagging system. We use the original CHI tag system asdefining approved tag relations. [3, p.30] To record user additions and ratings, a

1http://www.espgame.org/

Page 23: Technische Universiteit Eindhoven Technische Informatica

3.1. DATA STRUCTURE 17

Figure 3.2: Data structure of Comment object

new object, called Comment, is added. The structure of this object is shown infigure 3.2. It has the following properties.

• hasComment: This property links a Comment object to the tagged object.To enable the tagging of both existing Archiefobjects and Bookmarks, it isdefined as a relation between ListItem and Comment. The existence of thisproperty means that a certain object has been tagged by users, but the taghas not yet been approved by RHCe.

• concerning: This property clarifies which tag is being commented to theobject in question. This will either be a Location, defining a geographicaltag, an Interval, defining a temporal tag, or a general Tag.

• addedBy: This property logs which user originally added this tag to theuser. This value can be used to find knowledgeable users. If a user has addeda lot of tags that have been approved later, he could be a reliable sourceand might be considered by RHCe for promotion to moderator status.

• commentedBy: This property logs all users who have approved of thistag in relation to this object. If a user adds a tag, this property will beautomatically added along with the addedBy relation. The original user isfree to change his vote at a later date, as all users can. The commentedBy

and rejectedBy properties are mutually exclusive for a certain user and acertain Comment object.

• rejectedBy: This property works similar to the commentedBy relation,except it denotes a user’s disproval of a tag in relation to a certain object.

Page 24: Technische Universiteit Eindhoven Technische Informatica

18 CHAPTER 3. DESIGN

CHI Explorer uses several child objects to more easily differentiate Comment

objects internally. The following relations are not specifically defined in theontology, but are used this way in CHI Explorer.

• TagComment: An object of this type has a concerning relation to a Tag

object.

• PeriodComment: An object of this type has a concerning relation to aInterval object.

• LocationComment: An object of this type has a concerning relation toa Location object.

This data structure allows tags to be reviewed before they are added perma-nently to the system. By counting the commentedBy and rejectedBy propertiesan algorithm can be created to automatically evaluate tags. This algorithm isnot created in this project, but an initial one could be an approval of at least 80percent with at least 10 votes.

These changes to the CHI tags and GABOS ontology allow us to includevideos in CHI Explorer. They also allow users to add and evaluate tags to thenew system, increasing the amount of metadata that is available to search thecollection. The next section will explain how users are able to actually view thevideo collection.

3.2 Video Encoding

Before a video can be encoded, it is split in several input streams. Usually,these only consist of one video and one audio stream, but other streams couldinclude subtitles, extra audio streams (e.g. for different languages or commentarytracks), other video streams (e.g. different languages), transport stream (to allowmultiplexing of digital video and audio and to synchronize the output), etc. Thesestreams are then each encoded with their own codec and then combined in a videocontainer.

In this project these extra video or audio streams are not necessary, sincemost videos in the RHCe collection only have one video and one audio stream.In this section we will first discuss the types of video codecs that are available.Next, in section 3.2.1, we will show which video containers are available. Finally,we will look into streaming software in section 3.2.3.

Selecting the correct codec and container allows CHI Explorer to use lessbandwidth per user, which in turn allows more users at a time to use the system,which will lead to more tags and more metadata added to the system.

Page 25: Technische Universiteit Eindhoven Technische Informatica

3.2. VIDEO ENCODING 19

3.2.1 Codecs

In this section we examine the possible codecs that need to be used both in audioand video encoding. We will first briefly look into the audio codecs. Next, wewill examine the video codecs in more detail.

Audio Codecs

Currently, the most used audio codec is the .MP3 codec. This codec deliversfairly good quality in combination with small file size in relation to the standardWAV standard. However, it is also an older format.

Advanced Audio Coding (AAC) has been developed as the successor to MP3.It is used as a codec in iTunes, as well as several music players, such as iPod,Sony Playstation and Nintendo Wii. It is also the standard audio codec in theMP4 container and some digital radio standards like DAB+.

Video Codecs

For the video codec there are more choices available. The most widely usedof these codecs are based upon the Moving Pictures Expert Group (MPEG)standards. Of these standards, MPEG-1 is used in VCDs, MPEG-2 in DVDsand MPEG-4 is being used mostly in video streaming on the Internet. Thesecodecs provide DVD quality video with an average compression ratio of 10:1 ona raw video stream. A disadvantage of the standard MPEG codec is that it usesrelatively a lot of resources to both encode and decode these streams.

This has led to the development of DivX. This commercial codec is derivedfrom the MPEG-4 standard, but greatly improved the ease with which streamscan be encoded and decoded. This contributed to the success and following thiscodec has had on the Internet.

The success of this commercial codec has led to the development of its opensource counterpart, Xvid. This codec, also based upon MPEG, delivers similar ifnot better quality than DivX for years. This is due to the fact that Xvid is highlycustomizable, but it is also its major weakness. Finding the correct options canbe difficult, while using the wrong options can introduce unwanted artifacts in avideo.

The newest version of the MPEG standard is MPEG-4 Part 10, also knownas Advanced Video Coding (AVC). It is a codec that features both lossles as lossycompression and can generate the same quality of video at lower bit rates thanother MPEG implementations. There is also a open source codec available forthis codec, called x264.

Other available codecs include M-JPEG and Digital Video (DV). These codecsfocus more upon storage and/or quality, which make them less useful in streamingapplications.

Page 26: Technische Universiteit Eindhoven Technische Informatica

20 CHAPTER 3. DESIGN

To be able to compare the codecs, I have performed some testing. I haveencoded a simple 10-second video in five codecs, and lowered the bit rate of eachuntil the quality degraded too much. The results are shown in table 3.1.

codec kb/s file sizeH.264 128 373 KBXvid 528 923 KBDivX 528 1003 KBDV 29040 37 MBM-JPEG 9600 14 MB

Table 3.1: Comparison of codecs

This table shows the superiority of H.264, as well as the small difference be-tween Xvid and DivX. Also note that DV did not have an adjustable bit rate, andhas therefore been used as a default concerning quality. Our choice therefore willbe the AAC codec for encoding audio streams, and the h264 codec for encodingvideo streams.

3.2.2 Video Containers

Several video containers are in use at the moment, each one trying to optimizequality and minimize size. Even though video containers are set up to accom-modate numerous codecs, some of them can only be shown on a small subset ofvideo players due to copyright issues. Many of these containers also default to acertain codec.

The Apple QuickTime (.mov) format, for example, is mostly used in conjunc-tion with the Sorenson codec, which delivers high quality video, but does this atthe cost of relatively large video files. Also, there are not many video players thatcan read this format. Another problem arises when users try to search within avideo file. This is caused by the fact that several frames are bundled together in0.5 or 1 second blocks by the QuickTime codecs. When a user searches in thisvideo file, he might start playback in the middle of one of these blocks. Sinceinformation to decode this frame could be stored before this point, frames mightbe incomplete when playback starts again. This container is maintained andprotected by Apple, which means a license is needed to encode videos in thisformat.

The Real Media (.rm) format, on the other hand, delivers a high compressionratio, but achieves this by removing relatively large amounts of detail from avideo. This can be seen especially well at fast movements within a video, whichtend to blur. As far as I have found, Real Player is the only video player thatcan show .rm videos. [2]

The Microsoft Audio Video Interleave (.avi) format is a more open format

Page 27: Technische Universiteit Eindhoven Technische Informatica

3.3. INTERFACE 21

in the codecs it allows, due to its simple structure. It is also a format that canbe read by numerous if not all video players. This is an older container format,however, and is being exchanged for the MPEG-4 part 14 (.mp4) format. Thisformat is based on the Apple Quicktime container, extended with several MPEGfeatures. [1] Examples of its use are iTunes, which uses .mp4 in audio sales.

Currently, more open source video containers are being developed. Someexamples of these are Matroska video (.mkv) and Material Exchange Format(.mxf). Here, Matroska is specifically developed for video streaming, while MXFfocuses more on the storage of video files. These codecs have a reasonable togood quality, but they are either still in the test phase of development at thetime of writing of this document, or not sufficiently supported in video players.This makes them interesting alternatives for the future, but not as useful for now.

CHI Explorer will use the .mp4 container, which is uphold by the InternationalOrganization for Standardization (ISO). This ensures the stability of the format.Since it is a standard, it will be widely used, which makes future links to othersystems, or digital additions to the video collection from other sources more likelyto be in this format, reducing the overhead when inputting these new files intothe CHI Explorer system.

3.2.3 Video Streaming

The VideoLAN Client (VLC) is a freeware Video Player, which supports manykinds of codecs and containers, such as Xvid, H.264, .avi and .mkv. It can alsostream video using the TCP/IP protocol and has the ability to transcode videoduring streaming. This means that it is possible to read a video file in a certaincodec and then stream it using a different codec. This enables the applicationto dynamically change a codec if necessary, without changing any data it hasstored. VLC also includes the VideoLAN Manager (VLM), a built-in serverAPI that has the ability to provide Video on Demand. This technology ensuresa user to watch a certain video, having complete control to play, pause, stop orsearch through the video. It also enables multiple users access to the same file,without losing control options.

3.3 Interface

In this section the user interface of CHI Explorer will be explained. The systemwill need three main interfaces. The first will be the search interface and itsresult pages in section 3.3.2. The next will be the Detail view of a certain objectif a user selects it from the result list. This will be seen in section 3.3.3. Lastly,a user needs to be able to add tags to the objects in the RHCe collection. Thisinterface is explained in section 3.3.4.

Page 28: Technische Universiteit Eindhoven Technische Informatica

22 CHAPTER 3. DESIGN

Some management tasks are needed to maintain the site. This basic featureof the site will be explained in the next section.

3.3.1 User Management

Several management tasks need to be implemented to enable CHI Explorer towork. These include user management tasks. Since we want to allow users to tagobjects, it is necessary to keep track of these users. This allows the system to markboth users that supply the system with high quality metadata, nominating themfor a promotion, and users that flood the system with inane tags, nominatingthem for a system-wide ban.

Implementing such a system correctly and securely is a project on its own,so we will use an existing system to do this, called Joomla!2. This is a ContentManagement System (CMS) and Web Application Framework, implementing allaspects of user management we need, such as password protection and registeredonly module access.

Joomla! is an open source php-based system, which is constructed modularly.This allows us to create CHI Explorer as a Joomla! module, enjoying all CMSabilities and protections. The exact manner to integrate these systems will beexplained in section 4.1.

3.3.2 Search Results

CHI Explorer needs to be able to search the same values as the previous CHIsystem. This means we need to be able to search for tags, locations and periodsof time. CHI had a different interface for each search type. CHI Explorer willsimplify this by using a single query element, allowing the user to add both tagand location queries in the same input box. We will add an extra input optionto add period queries, which allows us to impose a structure to the manner inwhich users search for this tag type. This is done by asking for specific beginand end dates, which limit the user to submit only one period per search query,simplifying the search algorithm.

The search query interface can be seen in figure 3.3. The top input box allowsusers to search for multiple tags in both the image as the video collection. Thesecan be freely combined in any way the user wants. For example, ”bombardment,Eindhoven, factory” is a valid query. An empty query will show the completecollection.

The line beneath this box contains the period query form. It is split in abegin and end period, each split up in separate fields for day, month and yearvalues. Since a period value has a distinct format and each query can have only

2http://www.joomla.org/

Page 29: Technische Universiteit Eindhoven Technische Informatica

3.3. INTERFACE 23

Figure 3.3: Standard CHI Explorer search and result interface

one begin and end point to search, this field has been separated from the tagsinput box.

The next line contains several ways in which the results of a query can be for-matted. These will be discussed below. A result type can be chosen by selectingone of these radio buttons before submitting a query.

Result List

This result view can be seen in figure 3.3. It is a list of objects relevant to thesearch query. Each item in this list is a tagged object in the RHCe collection.Its entry will consist of a thumbnail view of the object and some information,including tags, loctions, periods and an identification. The tags are linked tothe search cloud view as described below, allowing a user to browse through thecollection using the tag system.

The Result List is split up in pages, each consisting of 25 items. Links at thetop and bottom of a page allow a user to browse through this list. By clickingthe thumbnail, a user will be moved to the Detail view as described in anothersection below.

As can be seen in figure 3.3, the search form is shown above the result view.

Page 30: Technische Universiteit Eindhoven Technische Informatica

24 CHAPTER 3. DESIGN

Figure 3.4: CHI Explorer search cloud

This allows the user to fine tune his search query during search. The search formwill be shown above the result in all other views, but this has been cut from thescreenshots due to redundancy.

Search Cloud

The search cloud as used here is a revision of the system used in the previous CHIversion. [3, p.77] This search cloud visually maps relations between tag concepts,allowing users to see how these concepts are related in the CHI system. Theconcept in the middle of the graph is the current focus of the view. All conceptsrelated to this focus are depicted around it. The relation types are color-coded,as shown in the legend at the bottom left. This legend contains check boxes toenable the user to only look at specific relation types.

Concepts can be of different types, which are also color-coded in the legendon the bottom right. clicking one of the relating concepts will refocus the graphon this concept, showing all relations of the new concept. By double-clickingthe focused object, CHI Explorer will be commanded to perform a search of thisconcept on the RHCe collection.

Page 31: Technische Universiteit Eindhoven Technische Informatica

3.3. INTERFACE 25

Figure 3.5: CHI Explorer time line view

Time Line

Since CHI Explorer has access to information about the period in which an objectis relevant, we can use this to sort the search results. This is shown in figure 3.5.Here, the objects are sorted on a time line, which is divided in two parts withdiffering scales. The smaller lower bar allows users to get a overview of the objectsin time and allows the user to quickly move through the time line by draggingthis bar. Dragging any of these two bars moves both bars a relative amount oftime. At the top of the view, some links have been added to quickly move todistant time periods.

Clicking an object in the top bar allows the user to view all objects of thistime frame in a small list view. Objects can be clustered by tag or location,which will be explained in detail in section 4.3.4. By clicking the thumbnail ofan item, the user will open a detail view of that specific object.

Page 32: Technische Universiteit Eindhoven Technische Informatica

26 CHAPTER 3. DESIGN

Figure 3.6: CHI Explorer Map view

Map

We also have access to the location where the object is made. We can use thesevalues to sort the objects geographically and plot these points on a map. Theexact process is described in sections 4.2.2 and 4.3.3.

The Map view is shown in figure 3.6. Using Google Maps, the objects arevisible on a map of Eindhoven. Objects that are relevant to the same locationare clustered together. A marker is generated for each location. If the user clicksthis marker, a form will open containing a list of objects. This list has the samestructure as the time line list detailed above. Clicking a thumbnail will againredirect the user to a detail page concerning this object.

3.3.3 Detail View

This section will discuss the detail views that CHI Explorer uses. The first oneis an update of the CHI image view. This view displays the image in question,along with its tags and some other information, such as a description and anidentification.

Page 33: Technische Universiteit Eindhoven Technische Informatica

3.3. INTERFACE 27

Figure 3.7: CHI Explorer Video view

The second view details the video stream interface. In this view, as shownin figure 3.7, the video is shown in the top left part of the interface. Belowthis frame, there are two slide bars shown. The topmost one shows the positionwithin the current video, while the second slide bar is used to show the positionand duration of a fragment, if applicable. Below these slide bars, several buttonscan be used to control the video playback.

The same metadata as is shown in the image view is located below the videocontrols. By clicking the headers, signifying tag, location or period, a user canadd new tags to this object. At the top right a list of tags, relevant to this video,can be seen. Behind these tags one of three icons can be seen:

• Letter T: This tag is relevant to the entire video.

• Bullseye: This tag is relevant to the current fragment that is playing.

• Magnifying glass: This tag is relevant to another fragment within the cur-rent video. Since a tag could be relevant to multiple fragment within thesame video, it is possible to have multiple copies of this glyph behind thesame tag. Clicking this glyph will move the video to the start of the frag-ment in question.

Page 34: Technische Universiteit Eindhoven Technische Informatica

28 CHAPTER 3. DESIGN

Figure 3.8: CHI Explorer Tag forms (Location, Period, Tag)

3.3.4 Tag Forms

Figure 3.8 depict the three types of Tag forms in CHI Explorer, accessible throughthe Detail pages as described in section 3.3.3. The top part is the same in all threeforms. This part shows the tags that are already known for this term in relationto the specific object. In the Location form, there are four possible depictions ofthe tag entries visible:

• RHCe logo: this specifies a tag that has been approved by RHCe. Userscan no longer approve or disprove this tag, but they can still discuss thistag in a forum thread by clicking the tag itself.

• Red thumb down: this specifies the current user’s disapproval of this tag inrelation to the current object. By clicking the red thumb down, the currentuser can remove his disapproval vote. By clicking the greyed out thumb up,the current user can approve this tag. By clicking the tag itself, the userwill be redirected to the discussion page of this tag relative to the currentobject.

• Green thumb up: this specifies the current user’s approval of this tag inrelation to the current object. By clicking the green thumb down, thecurrent user can remove his approval vote. By clicking the greyed outthumb down, the current user can disapprove this tag. By clicking the tagitself, the user will be redirected to the discussion page of this tag relativeto the current object.

Page 35: Technische Universiteit Eindhoven Technische Informatica

3.3. INTERFACE 29

• Two greyed out hands: this specifies the current user has not yet committedan opinion of this tag in relation to the current object. By clicking thegreyed out thumb down, the current user can approve this tag. By clickingthe greyed out thumb down, the current user can disapprove this tag. Byclicking the tag itself, the user will be redirected to the discussion page ofthis tag relative to the current object.

After this voting block, users can propose tags to the system. In the caseof Locations and Tags, this is a standard input box, where multiple tags can beadded simultaneously, separated by commas. In the case of Periods, the inputrequires a more structured input. Here we ask for input in the same manner asthe period query in the search form as described in section 3.3.2.

The final part of the interface only appears if the user proposed a certaintag. The interface that appears for Periods is a list of all known periods thathave been used in the system. This is done to more likely increase the amountof specific dates for certain objects. For example, if a user proposes 1944, CHIExplorer checks which periods it already knows that overlap this date and returnsthem in a list. The user then has the ability to fine tune its proposal by addingany of these extra dates to its proposal. This can be done by clicking the taglabel in this interface. Clicking any selected tag label will remove it from thecurrent proposal. The radio buttons located next to the tags are an optionalinterface where users can comment upon the relevance of the tags generated bythe system in relation to their proposal. This information can later be used togenerate better related tag lists.

The Tag interface works mostly in the same manner as the Period interface,with a slight adaptation to the way related tags are returned. Here relationsbetween concepts in the previous CHI system are used, along with the use ofdifferent tags in other objects. This uses the TagSuggestion object created by G.Ketelaars[5]. For example, in a proposal of the tag bombardment, CHI Explorerfound several objects in the collection that combine this tag with the tags Second

World War and War Damages. It also found a connection between the tagsbombardment and bombardments, since they belong to the same concept. Thesetags are then combined in a list of alternatives, and displayed to the user in thesame way the periods were.

The Location interface accounts for the hierarchic structure in which mostlocations are contained. This is why locations are displayed as a tree structure.CHI Explorer checks its system for location tags that are similar to the proposaloffered by the user. Next, it checks all locations that are part of these locationsand generates a tree structure from this list. The reason we do this is to offer theuser a way to specify his input. It might happen that a user cannot remember thestreet name correctly where a certain event happened and because of that entersthe city name. By generating a list of city districts and street names, a user

Page 36: Technische Universiteit Eindhoven Technische Informatica

30 CHAPTER 3. DESIGN

can navigate this interface and enter the correct street name after all. Locationscan be added from this list by double clicking the location tag. This action alsoremoves selected locations from the list.

When a user is satisfied with its list of proposed tags, which are displayedbeneath his original search query, he can commit this list by clicking the commit

button. This will add all selected tags to CHI Explorer. All tags entered thisway will be given the current user’s approval.

This concludes the design of the interface. The next chapter will describe howthese design decisions have been implemented and how several subsystems havebeen connected.

Page 37: Technische Universiteit Eindhoven Technische Informatica

Chapter 4

Implementation

This chapter discusses the implementation decisions that have been made in thecreation of CHI Explorer. It is divided into four sections. Section 4.1 explainshow the CMS Joomla! can be extended. Here we take advantage of the modularapproach of Joomla! and use it in CHI Explorer, allowing us to extend CHIExplorer in the future in the same way.

Section 4.1.3 denotes the inclusion of SMF in Joomla!, used to manage thetag discussion pages. This allows us to keep track of the tag discussions in asystem which has already implemented all necessary elements, such as threadmaintenance, user security and a structured environment to store this data.

Section 4.2 shows how the CHI Explorer backend communicates with variousdatabases. Here we have chosen for a direct Java connection between our PHPbased system and the Java based Sesame repository. By allowing Java to do someof the more data specific calculations, we can cut back on the number of datarequests and create a faster system. This section also explains how CHI Explorerinitially generates the location coordinates it uses in the map view. Furthermore,it explains how it reformats the original CHI period data into the standardizedW3C time ontology.

Finally, section 4.3 explains the implementation decisions made in the php in-terface. Here we show how the modular construction of CHI Explorer is achievedand how CHI Explorer creates the extra search result pages, such as the map andtimeline views. Here we also show how CHI Explorer clusters information in thetimeline view to create a less cluttered and more readable result page. Lastly,it explains how the TagSuggestion system, created by G. Ketelaars, is used toautomatically find similar but not directly connected tags in the RHCe dataset.

31

Page 38: Technische Universiteit Eindhoven Technische Informatica

32 CHAPTER 4. IMPLEMENTATION

4.1 Joomla!

As mentioned in section 3.3.1, CHI Explorer will use the CMS Joomla! as abase. This system contains a user management system to authenticate userswith different access privileges, allowing different users to access Administrator,Moderator and Registered user pages. This enables CHI Explorer to grant RHCeprofessionals complete access to the web site, while allowing registered users torate and discuss tags and other anonymous users to browse the collection ofobjects.

All components should be located in the components map in the Joomla!installation map. The name of the component in the Joomla! option variable isinterpreted as a map within this components map. In this map, a script with thesame name as the map minus the com part is opened. In our component, thepath to the main script would be:

\component\com_chi\chi.php

This means the internal name of our component is com chi and the name ofthe main script is chi.php. This script should handle all requests to CHI Explorerand either process them itself or delegate them to other scripts.

Joomla! components can be loaded by adding the option PHP variable to theweb address of the Joomla! index page. To start the CHI Explorer component,we could use the following line:

www.rhc-eindhoven.nl/Joomla/index.php?option=com_chi

The next section explains how CHI Explorer uses the Joomla! system. Thisincludes the setup of CHI Explorer as a Joomla! component, the creation of theJoomla! site using templates and the integration of the Simple Machines Forum.

4.1.1 Components

The Joomla! architecture is constructed to allow easy inclusion of PHP-basedcomponents. This allows us to construct CHI Explorer modularly, which in turnenables it to be easily extended. It also includes several encapsulation functionsand variables to allow access to the offered CMS functionality. The most impor-tant of these will be discussed here.

VALID MOS

This constant is defined in the main Joomla! code. Any Joomla! component istreated as an inclusion in another page, so by checking if this constant exists,we can control the program flow, breaking of calls to the component that do not

Page 39: Technische Universiteit Eindhoven Technische Informatica

4.1. JOOMLA! 33

originate from the Joomla! system. This can be done using the following PHPcode at the beginning of a component page:

defined( ’_VALID_MOS’ ) or

die( ’Direct Access to this location is not allowed.’ );

$my

This PHP object can be called in Joomla! components and contains all informa-tion available in its user management system. This value is updated whenever auser logs in or out and is valid throughout a session. For example, CHI Exploreruses this value to extract which current user has been logged in.

$mosConfig live site

This variable contains the web address to the main page of the web site. Thisvalue can be used to generate relative links in CHI Explorer. Joomla! allowscomponents to be called from the main page in the following manner:

$mosConfig_live_site . "/index.php?option=com_chi&task=form"

CHI Explorer uses the same system to load subsystems using the task variableas Joomla! uses the option variable, in this case loading a tagging form.

$mosConfig absolute path

This variable contains the absolute path of Joomla! on the hard drive. This canbe used to include different scripts, or add links to local images to a page. Anexample of its use is the following line, used to include a sub script:

require_once($mosConfig_absolute_path .

’/components/com_Chi/sesame-int.php’);

4.1.2 Templates

Joomla! uses the patTemplate1 structure to create web pages. Templates areformed using XHTML code and predefined XML tags. These XML tags will bediscussed here, using an example of a CHI Explorer template file.

<patTemplate:tmpl name="tag_options">

<TABLE>

<TR>

<TD>Welke term bedoelt u?<TD>

1http://trac.php-tools.net/patTemplate

Page 40: Technische Universiteit Eindhoven Technische Informatica

34 CHAPTER 4. IMPLEMENTATION

</TR>

</TABLE>

<TABLE>

<patTemplate:tmpl name="tag_row">

<TR>

<TD>

<SPAN style="padding-left:80px" />

</TD>

<TD>

<A href="{LINK}">{TERM}</A>

</TD>

</TR>

</patTemplate:tmpl>

<TABLE>

</patTemplate:tmpl>

This template is used to generate a list of tag alternatives when the userinitially asks for a tag cloud representation, as described in section 3.3.2. In CHIExplorer, templates are always inserted in the body of a page. In the templateabove, we can see the main tag, patTemplate:tmpl, which surrounds a templatewithin a file. Multiple templates can be located in one file, as long as all topnodes are these tmpl nodes. This XML node has a name property, identifyinga template within a file. Template nodes can contain other template nodes, asshown in this example.

<patTemplate:tmpl name="form_bookmark" type="condition"

conditionvar="add_bookmarks" varscope="form_items">

<patTemplate:sub condition="1">

<script text="text/javascript">

...

</script>

<table>

<tr>

<td width="50px"></td>

<td>

...

</td>

</tr>

</table>

</patTemplate:sub>

<patTemplate:sub condition="__default">

<script text="text/javascript">

Page 41: Technische Universiteit Eindhoven Technische Informatica

4.1. JOOMLA! 35

...

</script>

</patTemplate:sub>

</patTemplate:tmpl>

Another option is the use of condition templates. Condition templates usea condition variable and are generated according the value of this variable. Inthis example, the HTML code of the first sub is used if the value of the conditionvariable equals 1. If the variable has any other value, the second sub will be used.

Variables are normally valid for one template only, which do not include nestedtemplates. To make variables valid in other templates, the varscope property canbe used.

Several PHP commands can be used to generate web pages from this template.

require_once($mosConfig_absolute_path .

’/includes/patTemplate/patTemplate.php’);

This command is needed to load the patTemplate functions. It is located in theincludes folder after installation of Joomla!.

$tmpl = new patTemplate();

This command creates a new template engine object in PHP. CHI Explorer willalways use the variable $tmpl for this use, to make the code more readable.

define("TEMPLATES", $mosConfig_absolute_path .

"/components/com_Chi/templates");

...

$tmpl->setBasedir( TEMPLATES );

This command sets the base map where all template files should be located. Thislocation is used internally in the next command, which loads the actual templates.

$tmpl->readTemplatesFromFile( "search.xml" );

This command loads a file containing templates into the $tmpl object. Multiplefiles can be loaded into the same object sequentially, as long as each templatehas a unique identifier.

$tmpl->addVar("search", "QUERY", $search);

This command loads the contents of the PHP variable $search into the templatevariable QUERY of the template identified by the name search.

$tmpl->addRows("tag_row", $rows);

Page 42: Technische Universiteit Eindhoven Technische Informatica

36 CHAPTER 4. IMPLEMENTATION

CHI Explorer ForumGeneral conversationObject discussions

...Object x

tagslocationsperiodsdescriptions

...

Figure 4.1: Board hierarchy in CHI Explorer forums

This command applies a template multiple times to each row in a PHP array.In this case, it uses the template tag row, described above, to display multipleoptions of LINK and TERM options. $rows is a PHP array, consisting of rowsindexed by integers, which is standard in PHP array construction. Each row hastwo cells, labeled LINK and TERM. This command applies the named template,tag row in this case, to each row and will generate multiple copies of this templatein the resulting web page, in the order of the array rows.

$tmpl->displayParsedTemplate("search");

This command will output the specified template to the web page. If the templateconsists of multiple rows, as mentioned above, all rows will be generated.

4.1.3 Simple Machines Forum

Allowing users to discuss tags gives us a lot of overhead. We need to make surethese discussions do not result in amounts of off topic ranting or the playground ofadbots. Implementing the actions users can take and the administrative actionsRHCe professionals or promoted users can take to minimize misuse, is anotherproject in itself.

For this reason we have chosen to rely on forum software. The actions thatneed to be taken to maintain these discussion pages coincide with the actionsthat need to be taken in any forum application, discussion being the main drivebehind these sites. We have chosen the Simple Machines Forum (SMF) software,because of the access to the code thanks to the open source license, and theexisting ways to incorporate SMF within Joomla!.

A discussion page on a forum is called a thread. The repository of severalthreads is called a board. SMF has the ability to nest boards in other boards,enabling us to set up a certain structure in which we locate each thread. CHIExplorer will use the board hierarchy as displayed in figure 4.1.

The CHI Explorer forum consists of two boards. The first is the general

conversation board. Here RHCe professionals can post announcements and users

Page 43: Technische Universiteit Eindhoven Technische Informatica

4.2. DATA INTERFACE 37

can discuss subjects that are not directly relevant to the objects in the RHCecollection. This board has been created to minimize off topic conversations inthe actual discussion threads.

The second board contains all discussion threads concerning the collectionobjects. Each object has its own separate board and within each object boardthere are separate boards for tag, location, period or description discussions.Users are not allowed to start a thread directly in the tag, location or periodboards. This can only be done via the respective object detail interface, asmentioned in section 3.3.3 and explained in more detail in section 4.3.9. Dueto the fact that there is no clearly defined structure in the description field inan object’s metadata, users are allowed to start their own thread in an object’sdescription board.

Users do not need to manually search for a discussion thread in this forumto post discussions. CHI Explorer will link to the specific discussion thread froman object if it already exists, or it will create a new thread if this is not yet thecase. This is explained in more detail in section 4.3.9.

4.2 Data Interface

In this section we will discuss the database connections CHI Explorer uses. Sec-tion 4.2.1 explains how we communicate with the Sesame repository, containingthe CHI Explorer ontology used to perform searches and tagging. Section 4.2.2explains how we obtained geographical coordinates, allowing to plot search re-sults on a map, as mentioned in section 3.3.2. Section 4.2.3 will explain how CHIExplorer changed the way period information was stored to conform to the W3Cstandard.

4.2.1 Sesame

The previous CHI system used a Sesame repository to store object metadata.It used a http connection to this repository to send queries and receive results.This is not very efficient, since this connection is slow, especially since the Sesamerepository is located on the local machine. This inefficiency occurs when queriesare send via the web server over a HTTP protocol and are then translated tothe Java interface that Sesame uses. These queries are then executed and theirresults are recoded from the Java interface to the HTTP protocol, where theyare used in PHP to display the information. During the creation of a web page,multiple queries are needed, each of which need to repeat the entire process.

To increase its efficiency, CHI Explorer uses a Java2 interface to directlyaccess the Sesame interface. This bypasses a translation step and allows us to

2http://java.sun.com/

Page 44: Technische Universiteit Eindhoven Technische Informatica

38 CHAPTER 4. IMPLEMENTATION

write specific functions to sequentially create, execute and manipulate Sesamequeries. In this case we only have to pass variables between PHP and Java onceper object creation. This cuts down on Java object initialization times.

The Java interface is constructed in two parts the first part contains generalfunctions to directly execute queries on the Sesame repository. The second partcontains specialized functions, designed to implement and work on the CHI Ex-plorer ontology as described in section 3.1. This second part uses functions fromthe first part to communicate with the Sesame repository. The structure is setup this way to be able to change the Sesame interface in case of a new API ofSesame, without having to recode the entire CHI Explorer-Sesame interface.

To allow CHI Explorer to directly use Java classes in PHP code we useJavaBridge3. This system adds a persistent Java interface in a JavaServlet serversuch as Tomcat, which enables Java classes to be directly called within PHP.This allows CHI Explorer to create a direct connection to the Sesame repositoryusing the following Java interfaces.

Sesame-Java interface

The Sesame-Java interface is identified as tue.Chi.Sesame.SesameInt and containsthe following functions.

Headers = new String[0];

public String[] GetHeaders();

Data = new String[0][0];

public String[][] GetValues();

These variables are filled when a SeRQL Select query is executed. Headers containthe labels of the columns used in the output of the query. Data contain the rowsof results that the query generates.

public boolean Open(String UserName, String Password,

String Rep, OpenTypes OpenType);

This function opens the repository Rep using the specified UserName and Pass-

word. Open returns true if the repository is successfully opened and otherwisereturns false. A repository can be opened using three connection types:

• OpenTypes.URL: Opens the repository via a HTTP connection.

• OpenTypes.URI: Opens the repository via a RMI connection.

3http://php-java-bridge.sourceforge.net/

Page 45: Technische Universiteit Eindhoven Technische Informatica

4.2. DATA INTERFACE 39

• OpenTypes.Local: Opens the repository as a service in the same Java vir-tual machine.

public boolean DoQuery(String Query);

This function executes a SeRQL Select query on the opened repository. Query

contains a SeRQL Select query. DoQuery returns true if the query is successfullyexecuted, otherwise it returns false.

public boolean ConstructQuery(String Query);

This function executes a SeRQL Construct query on the opened repository, whichare used to add data. Query is a string containing the SeRQL Construct Query.ConstructQuery returns true if the query was successfully executed and otherwisereturns false.

public boolean RemoveQuery(String Query);

This function is used to remove all Sesame Concepts that are generated by aquery in an opened repository. Query is a SeRQL construct query. RemoveQuery

returns true if the removal executed without errors, otherwise it returns false.

protected String NormalizeString(String Str);

This function is used internally to remove unwanted characters from Sesameconcept names. In the current implementation it only removes spaces, turningthe Netherlands into theNetherlands, but it is used to be able to centrally removeother characters as well, if needed.

public int columns();

public int rows();

public String values(int row,int column);

These functions are used to be able to retrieve query results from this interface.Columns returns the number of columns in the result table. Rows returns thenumber of rows of the result table. A resullt cell can be indexed by using values,using row and column as identifiers.

CHI-Sesame interface

The CHI-Sesame interface is identified as tue.Chi.Sesame.SesameCHIInt and ex-tends the Sesame-Java interface, granting access to all its functions. It containsthe following additional functions.

private String AddUser(String User);

Page 46: Technische Universiteit Eindhoven Technische Informatica

40 CHAPTER 4. IMPLEMENTATION

This function returns part of a SeRQL Construct query defining a user in theCHI Explorer ontology if the user does not exist. If the user already exists inthe system, it will return the empty string. This function enables the system toregister users so they can later be used in other queries to rate tags.

private String AddComment(String Object, String Tag,

String User, CommentTypes Type);

This function returns part of a SeRQL Construct query defining a commentconcept in the ontology as explained in section 3.1.3. Object contains the Objectconcept identifier, Tag contains the Tag concept identifier and User contains thename of the user that added this tag. CHI Explorer distinguishes between threetypes of Comments, as described in section 3.1.3:

• CommentType.Location: the tag is a geographical description.

• CommentType.Period: the tag is a temporal description.

• CommentType.Tag: the tag describes a contextual value.

private String AddDate(String Year, String Month,

String Day, String User);

This function returns part of a SeRQL Construct query defining a DateTimeDescrip-

tion according to the W3C time ontology as described in section 4.2.3. Year,Month and Day are the appropriate date values, represented as integers. User

represents the user which added the date.

private String AddInterval(String start, String end,

String duration, String User);

This function returns part of a SeRQL Construct query defining an Interval ac-cording to the W3C time ontology as described in section 4.2.3. Start and End

are identifiers of DateTimeDescription concepts as defined by the W3C time on-tology. Duration is an identifier of a DurationDescription. User represents theuser which added the date.

private String AddLocation(String location, String User);

This function returns part of a SeRQL Construct query defining a Location con-cept according to the CHI ontology as described in section 3.1.1, if it does notyet exists. If it does exist, it returns an empty string. Location is the label ofthe concept, for example ”Eindhoven” or ”Noord Brabant”. User represents theuser which added the date.

Page 47: Technische Universiteit Eindhoven Technische Informatica

4.2. DATA INTERFACE 41

private String AddTag(String Tag, String User);

This function returns part of a SeRQL Construct query defining a general Tag

concept according to the CHI ontology as descriped in section 3.3.4, if it doesnot yet exists. If it does exist, it returns an empty string. Tag is the label of theconcept. User represents the user which added the date.

private String AddTime(Integer[] times, String Time,

String User);

This function returns part of a SeRQL Construct query defining a video fragmenttime description as defined in section 3.1.2, if it does not yet exists. If it doesexist, it returns the empty string. Times is an integer array containing the timedefinition. The uses integer indexes, where index 0 describes the hour, index 1describes the minute and index 2 describes the second. Time is a label for thisconcept, using the format 0u0m0s. User represents the user which added thedate.

private String GenerateDate(String Year, String Month, String Day);

This function returns a date fromatted in the following manner: (Year(-Month(-Day))). Year, Month and Day are the respective dete values and are only addedto the string when they are not empty. This function is used to generate labelsfor date concepts.

private String GenerateTime(Integer[] times);

this function returns a time formatted in the following manner: (((times[0]u)-times[1]m)times[2]). Times is and integer array containing the time definition.The uses integer indexes, where index 0 describes the hour, index 1 describes theminute and index 2 describes the second.

private String GetBookmarkID(String Object,

String Start, String Stop);

This function searches CHI Explorer for the existence of a video fragment andreturns its identifier if found. If no such fragment exists for the current video, itreturns a new fragment. Object is the video object identifier. Start is the label ofthe start time of the fragment. Stop is the label of the end time of the fragment.Both of these labels can be generated using the function GenerateTime.

private String GetUnitType(String Year, String Month, String Day);

Page 48: Technische Universiteit Eindhoven Technische Informatica

42 CHAPTER 4. IMPLEMENTATION

This function returns the DurationDescription that belongs to a date, accordingto the W3C time ontology as described in section 4.2.3. Year, Month and Day arethe respective date values. If Day is valid, the type is considered time:unitDay,else if Month is valid, the type is considered time:unitMonth, else the type isconsidered time:unitYear.

private Integer[] SplitTime(String Time);

This function splits a time string in the format uu:mm:ss into an integer array,where index 0 contains the hours, index 1 contains the minutes and index 2contains the seconds. This array can later be used to generate video fragments.

private boolean RelationExists(String Subject,

String predicate, String Object);

This function checks if a certain relation exists in the Sesame repository. Subject

and Object are assumed to belong to the tha namespace. Predicate needs to besupplied with a namespace, if necessary. All three should be identifiers to Sesameconcepts.

private boolean SubjectExists(String Subject);

This function checks if a certain concept exists in the Sesame repository. Subject

is a concept in the tha namespace. It is used internally to check if requestedconcepts should be created.

private boolean TagFound(String tag, String prefix);

This function is similar to the SubjectExists function. Tag is the concept thatneeds to be checked. prefix contains the namespace, including colon, where theconcept belongs to.

private boolean TimeFound(String time);

This function checks if a fragment time object is already added in CHI Explorer.Time is the label of the time description, in the 0u0m0s format.

public boolean ApproveTag(String Object, String Tag,

String User, UserComment Type);

This function can be called when a user wants to rate a certain tag. Object isthe identifier of the object the tag belongs to. Tag is the label of the tag inquestion. User determines which user wants to approve this tag. Type defineswhat kind of rating the user gives this tag in relation to the object. This can beeither UserComment.APPROVE or UserComment.REJECT.

Page 49: Technische Universiteit Eindhoven Technische Informatica

4.2. DATA INTERFACE 43

public boolean AddLocation(String Object,

String Location, String User);

This function can be called when a user wants to add a Location tag to a certainobject. Object is the identifier of the object the tag is added to. Location is thelabel of the Location tag. User determines which user adds this tag.

public boolean AddPeriod(String Object,

Map<String, String> Dates, String User);

This function can be called when a user wants to add a new Interval tag to acertain object. Object is the identifier of the object the tag is added to. User

determines which user adds this tag. Dates is an associative array containingdate information using the following indexes:

• sj: This value defines the year of the start of the interval.

• sm: This value defines the month of the start of the interval.

• sd: This value defines the day of the start of the interval.

• ej: This value defines the year of the end of the interval.

• em: This value defines the month of the end of the interval.

• ed: This value defines the day of the end of the interval.

• duration: If no end of the interval is given, this value determines the lengthof the interval, using either oneYear, oneMonth or oneDay. If an end of theinterval is given, no value needs to be entered here.

The date values should be integer representations. sj is required at all times.If em is given, then ej is also required. If sd or ed are given, then sm or em,respectively, are also required.

public boolean AddSpecificPeriod(String Object,

String Periods, String User);

This function can be called when a user wants to add an existing Interval tag to acertain object. Object is the identifier of the object the tag is added to. Periods

is a comma-separated list of Interval tag identifiers, which need to be added tothe object. User determines which user adds these tags.

public String AddBookmark(String Object, String Start,

String Stop, String User);

Page 50: Technische Universiteit Eindhoven Technische Informatica

44 CHAPTER 4. IMPLEMENTATION

This function can be called when a user wants to create a new fragment. Object

is the identifier of the object that contains this fragment. Start contains thestart time of the fragment. Stop contains the end time of the fragment. Bothof these times are formatted as uu:mm:ss. User determines which user adds thisfragment.

public boolean AddTags(String Object, String Start,

String Stop, String TagList, String User);

This function can be called when a user wants to add tags to an object. Object isthe identifier of the object that will be added to. Start and Stop can be used tospecify a fragment, using the uu:mm:ss format. If these values are left as emptystrings, the entire object will be tagged. TagList is a comma separated list oftags to add to this object. User determines which user adds these tags.

public boolean SetLocation(String Tag, String OldType,

String NewType);

This function is used in the example of the administrator environment. It canbe used to set the Location type, according to the CHI ontology. For example,it can be used to set Eindhoven as a City type location. Tag is the locationconcept identifier which needs to be changed. OldType is the current type ofthe location. This can be the standard Location type, if no other type is known.NewType is the new type of the location concept. It is not necessary to includethe tha namespace in any of these variables.

public boolean SetLocations(String[][] Locations);

This is the batch version of the function above. Each row of Locations is consid-ered to consist of a Tag at index 0, an OldType at index 1 and a NewType atindex 2.

public boolean SetParent(String Location, String Parent);

This function can be used to redefine the parent of a certain tag in the CHIontology. In CHI Explorer it is used in the example administrator interface toedit the location hierarchy. Location is the identifier of the object to change,and Parent is the identifier of the object that is to be the new parent. Thisfunction will remove all other bredereTerm relations before creating the requestedbredereTerm relation.

Page 51: Technische Universiteit Eindhoven Technische Informatica

4.2. DATA INTERFACE 45

4.2.2 Location Coordinates

Section 3.3.2 mentions an alternative display of search results by plotting thesevalues on a map. To implement this view, we will use the Google Maps API. Tobe able to use this system, we need to acquire geographical latitude and longitudefrom the existing locations in CHI Explorer. We have collected this informationusing the same Google Maps API.

The code used can be reviewed in appendix A.1. Here, we first extract a listof all location concepts and store their identifier, their place and their parent’splace label. We need both these values, since several locations are named centre,but are identified by belonging to different cities (centre, Eindhoven versus centre,

Helmond). These values are inserted in the $locArray associative two-dimensionalarray, using the combination of concept and parent label as a key for the firstrow, and locatie as the key for the second dimension. This cell’s value will be theconcept identifier.

The AddCoords function will be executed for every row of this array. Thisfunction will request the coordinates of all keys in this array, and extend each rowwith a lat cell containing the latitude and a long cell containing the longitude.To do this an xml document will be loaded from the Google web site using theaddress created at the beginning of the AddCoords function. The importantvariables in this address are:

• q: this variable contains the location that will be queried.

• output: this variable sets what kind of feedback will be created. In thiscase we want an xml-file, so this should always be xml.

• key: this variable identifies the server that requested the information. Thiskey is bound to a web site or IP address and can be requested free of chargeat the Google web site4.

When all rows of the $locArray have been processed, an RDF file will becreated, which can later be loaded directly into Sesame, creating two new relationsin the location concepts:

• bevatLat: the latitude of the location.

• bevatLong: the longitude of the location.

Using this information to display search results in a map is discussed in section4.3.3.

4http://code.google.com/apis/maps/signup.html

Page 52: Technische Universiteit Eindhoven Technische Informatica

46 CHAPTER 4. IMPLEMENTATION

Figure 4.2: Data structure of W3C Time ontology

4.2.3 W3C Time Ontology

The previous CHI system implemented a simple structure to add period datastored in the Sesame Repository. Here, every period that was added to therepository was added as a unique concept. This resulted in multiple instancesof the same period concept for different objects that occurred in the same timeperiod. Since a concept should be the unique representation of this information,we will generate a new system to store this information.

To further facilitate any future connectivity with other systems, we will usethe time ontology created by the World Wide Web Consortium (W3C). Thisorganization develops and manages several standards for Internet protocols, suchas HTML and XML. By adhering to these open standards, other informationsystems can more easily communicate with CHI Explorer.

Figure 4.2 displays the time ontology as it is used in CHI Explorer. Themain concept, detailing a period of time, is time:Interval. This concept is con-nected to an Archiefobject using the newly defined tha:inPeriodeSTD relation,mirroring the original tha:inPeriode relation. Each time:Interval contains a singletime:hasBeginning relation to a time:Instant concept, denoting the start of theperiod.

Furthermore, time:Intervals have either a time:hasEnd or a time:hasDurationDescription

relation, depending of the type of time period. If a period is defined by ayear, a month or a day, it uses the time:hasDurationDescription relation to atime:DurationDescription instance, tha:OneYear, tha:OneMonth or OneDay re-spectively. The specific definition of these three concepts can be seen in the

Page 53: Technische Universiteit Eindhoven Technische Informatica

4.3. PHP 47

Appendix A.2 in the function addStandardNodes. If a period is defined as amore fluid interval, it uses the time:hasEnd relation to a time:Instant conceptdefining the end of the period.

A time:Instant concept defines the specific point in time. It uses the followingrelations to store this information.

• time:year: This relation defines the year of the instant as an integer.

• time:month: This relation defines the month of the instant as an integer,with January being 1 and December being 12.

• time:day: This relation defines the day of the month of an instant as aninteger.

• time:unitType: This relation defines the precision of the instant. In CHIExplorer, it will have one of three values.

– time:unitYear: the instant has a year as its smallest time definition.

– time:unitMonth: the instant has a month as its smallest time defini-tion.

– time:unitDay : the instant has a day as its smallest time definition.

• time:timeZone: This relation defines to which time zone the instant isrelative to. Since the RHCe collection is defined in the Netherlands, CHIExplorer uses tz-world:ATZ as the time zone indicator, which defines GMT+1.

4.3 PHP

In this section we will discuss how the CHI Explorer interface is connected tothe repository back end. Section 4.3.1 discusses how CHI Explorer functions arecalled and how variables are passed. Section 4.3.2 shows how search results areadded to a time line. Section 4.3.3 shows how these results can be plotted on amap. Section 4.3.4 discusses the way CHI Explorer clusters related objects in thetime line interface, reducing the clutter in the interface. Section 4.3.5 shows howCHI Explorer controls video streams. Section 4.3.6 shows how Sesame Queriesare encapsulated in PHP functions, increasing code legibility. Section 4.3.7 dis-cusses how related tags are dynamically shown while a video is playing. Section4.3.8 shows how search requests handle tag relations using the TagSuggestioncomponent. Finally, section 4.3.9 shows how links to tag discussion pages aregenerated using the SMF Forum software.

Page 54: Technische Universiteit Eindhoven Technische Informatica

48 CHAPTER 4. IMPLEMENTATION

4.3.1 Joomla! function calls

CHI Explorer functions are called using values in the PHP $ REQUEST variable.This allows us to simultaneously process variables retrieved by POST and GET

requests. As shown in section 4.1, CHI Explorer can be loaded using the op-

tion=com chi command. Subsystems within CHI Explorer use the task variableto select the correct function to be loaded. These functions are discussed in thenext section. The second part of this section will explain the data types expectedfor auxiliary variables.

CHI Explorer functions

The following list are subsystems of CHI Explorer. These can be called by fillingthe task variable in the page request. The list gives a short description whatthe subsystem does and notes which auxiliary variables can be used. Variablesenclosed by [ and ] are optional. The next section will explain what data typesthese variables expect.

• results: This is the default option if no task is given. It displays the searchinterface and a list of search results if a search query is supplied.variables: [pp], [searchquery], [sj], [sm], [sd], [ej], [em], [ed]

• timeline: This function again displays the search interface, but it will plotsearch results in a time line if a search query is given.variables: [searchquery], [sj], [sm], [sd], [ej], [em], [ed]

• map: This function displays the search interface, but it will plot searchresults in a map if a search query is given.variables: [searchquery], [sj], [sm], [sd], [ej], [em], [ed]

• cloud: This function displays the search interface, but it will plot a uniquefound tag in a search cloud, if a search request is given. If the searchrequest results in multiple possible tags, this function will display a list ofthese tags. If the user clicks on one of these items, he will be shown theappropriate search cloud.variables: [searchquery], [tagFilter]

• view: This function displays a detailed view of an object in the collection.It has been implemented to automatically choose the correct layout forimages, videos and fragments. This choice is built using a case statement,allowing for extensions without drastic recoding.variables: id

• form: This function displays a separate window which contains a form toedit a certain object’s tags. There are three types of forms, belonging to

Page 55: Technische Universiteit Eindhoven Technische Informatica

4.3. PHP 49

the three types of tags in CHI Explorer. It is also used to display a list ofadditional options to the user after one or more tags are proposed. Otherfuture forms can be added by extending the case statement in this function.main variables: formtype, id, ItemID, [BMParent, start, stop]

– period: This tag form is used to edit the temporal tags associated tothe object in question.variables: [fsj], [fsm], [fsd], [fej], [fem], [fed]

– location: This tag form is used to edit the geographical tags associatedto the object in question.variables: [term]

– tags: This tag form is used to edit the general tags of an object.variables: [term]

– description: This function does not load a tag form. It is used toallow users to respond to any textual description of an object in thecollection. It will redirect the user to a board in the forum where anythreads pertaining this object’s description are kept.overriding variables: forumtype, id

• form2: This function will add any tags to the repository. This function iscalled from the tag forms generated using form. It uses all the variables,with the following additions.

– period: extra variable: found

– tags: extra variable: found

– description: this subsystem cannot be used. Users may add opinionsin the forum using formÕdescription, but may not add descriptionchanges to the repository.

• forum: This function redirects users to a certain discussion thread associ-ated to the object and tag combination given.variables: idname, forumtype, value

• approve: This function allows the user to approve a certain tag in relationto an object. It is used in the rating system built in the tag forms of CHIExplorer. This is used for content tags, locations and periods.variables: id, term, ItemId

• reject: This function allows the user to disapprove a certain tag in relationto an object. It is used in the rating system built in the tag forms of CHIExplorer. This is used for content tags, locations and periods.variables: id, term, ItemID

Page 56: Technische Universiteit Eindhoven Technische Informatica

50 CHAPTER 4. IMPLEMENTATION

• markers: This function generates the input needed to initialize the mapresult page.variables: ItemID, [searchquery], [sj], [sm], [sd], [ej], [em], [ed]

• events: This function generates the input needed to initialize the time lineresult page.variables: ItemID, [searchquery], [sj], [sm], [sd], [ej], [em], [ed], [terms],[locations]

• termtable: This function generates a list of relevant tags of a video on acertain point in time.variables: ItemID, id, start

CHI Explorer variables

Any auxiliary variable needs to be supplied in the page request. These variablesare read in PHP using the following Joomla! function.

$search = strval(mosGetParam($_REQUEST, ’searchquery’, NULL));

Here, the PHP variable $search is initialized with the value from the search-

query index in the $ REQUEST variable, if this index exists. If not, it will beinitialized with the value NULL.

The following list contains the variables used in this way in CHI Explorer andwhat data types and formats are expected.

• pp: This variable denotes what page in the search result list the user wantsto see. This value is an integer between 0 and the highest page. By default,25 items are shown per page.

• searchquery: This variable is a string, containing a comma-separated listof tags. It is used in search requests. If this string is empty, it is assumedthe user wants to see all items in the RHCe collection.

• period dates: This variable is built using six other variables.

– sj: defines the starting year of a period.

– sm: defines the starting month of a period.

– sd: defines the starting day of a period.

– ej: defines the ending year of a period.

– em: defines the ending month of a period.

– ed: defines the ending day of a period.

Page 57: Technische Universiteit Eindhoven Technische Informatica

4.3. PHP 51

All these values are assumed to be integer values. sj is always needed todefine a period. If sd is given, then sm is also needed. The period end canbe supplied in a similar way. If the end is not given, it is assumed the periodis an interval (for example, 1945 is assumed to be an interval beginning atJanuary 1st 1945 and ending at December 31st 1945).

• new period additions: this value is also constructed within CHI Explorercode in a similar way as the period dates, except it uses the variables fsj,fsm, fsd, fej, fem, fed. It is used to add periods to an object by a user inthe period tag form.

• id: This variable is a string containing the identifier of an object in theRHCe collection. This usually is the value contained in the tha:listID rela-tion of an object concept.

• term: This variable is a string, containing a comma-separated list of tags.It is used when a user wants to add tags to an object using the tag forms.

• tagFilter: This variable is used to select which relations are shown in asearch cloud. It contains the string ”BGOSWZ”, using an uppercase letterto activate the relation and a lowercase letter to deactivate it. The letterssymbolize the following relations:

– B: This variable activates the bredereTerm relation, which has a simi-lar meaning as the broaderTransitive relation in the SKOS relationshipmanagement system5.

– G: This variable activates the gerelateerdAan relation, which has asimilar meaning as the related relation in the SKOS system.

– O: This variable activates the omVat relation. It signifies instanti-ations of concepts, such as the relation between the concept of therestaurant type and a specific restaurant.

– S: This variable activates the smallereTerm relation, which has a sim-ilar meaning as the narrowerTransitive relation in the SKOS system.

– W: This variable activates the wasVroeger relation. It defines conceptor name changes in time. This relation type can be used when differentbuildings housed different businesses in different points in time. Moreinformation about this relation can be found in [3, p. 40].

– Z: This variable activates the zieOok relation. This relation is usedfor loose relationships, such as a the relationship between soldiers andbarracks.

5http://www.w3.org/TR/skos-reference/

Page 58: Technische Universiteit Eindhoven Technische Informatica

52 CHAPTER 4. IMPLEMENTATION

• terms: This variable is used as a flag to notify the clustering algorithm inthe time line view. If this value is 1, it will cluster search results on theterms that are used in each item. If this value is empty, it will not do so.More information about the clustering algorithm can be found in section4.3.4.

• locations: This variable is used as a flag to notify the clustering algorithmin the time line view. If this value is 1, it will cluster search results on theapproved location that is used in each item. If this value is empty, it willnot do so. More information about the clustering algorithm can be foundin section 4.3.4.

• formtype: This variable is used to notify CHI Explorer which tag form isrequested by the user. It contains one of the following options:

– tags

– period

– location

– description

• idname: This variable is used to identify an object for use in the forumlinks. It is constructed by concatenating the object type and the objectidentifier, using a space to separate them. For example, Foto 1234.

• forumtype: This variable is used to identify what tag type the user wantsto comment upon in the forum. There are four options.

– tags

– period

– location

– description

• value: This variable contains a string of the label of the tag a user wantsto comment upon. It is used to select the correct thread in the forum.

• start: This variable contains the start time of a fragment, formatted as aninteger representing the number of milliseconds from the start of the video.

• sop: This variable contains the end time of a fragment, formatted as aninteger representing the number of milliseconds from the start of the video.

• BMParent: This variable contains the identifier of the video which afragment belongs to.

Page 59: Technische Universiteit Eindhoven Technische Informatica

4.3. PHP 53

• otherTerms: This variable is a string containing a comma separated listof terms already known in the CHI Explorer system. This list is used whenproposing new tags to the user.

• weight: This variable is a string containing a comma separated list of realweights of terms already known in the CHI Explorer system. Weights are setto 1.0 if the tag is approved by RHCe. It is set to 0.8 if the tag is proposedby users, with a 0.005 positive modifier for each user that approved thistag and a similar negative modifier for each user that disapproved this tag.This user approved value is limited between 0.75 and 0.85. This list isused when proposing new tags to the user. More can be read about theTagSuggestion system in section 4.3.8.

• found: This variable contains a serialized PHP array, where each row is atag proposed by CHI Explorer. Each row consists of the following indexes:

0. Concept identifier, used by the system to rate this tag.

1. Concept label, used to display the tag to the user.

2. Concept weight (as described in the section above), used by the systemto determine the relevance of a tag.

3. User rating, with one of the following options (this value only takesthe current user in account)

– -1: user has disapproved of this tag.– 0: user has not commented on this tag yet.– 1: user has approved of this tag.

4. Base search term. This value contains the concept identifier which hasbeen used if the concept in question is a proposed tag. Otherwise thisvalue is an empty string.

5. Proposal flag. This value is 1 if the row is a CHI Explorer proposal,and 0 otherwise.

• modTask: This variable functions in the same way as the task variable,but it distinguishes between moderator tasks. At the moment, it is a proofof concept, only implementing a location edit interface, using the stringlocations.

4.3.2 SIMILE Timeline

CHI Explorer uses the SIMILE Timeline interface to display time lines. To supplyinformation to this interface, the system generates the necessary data using aPHP script which can be read by the loadJSON function. This data is formattedusing the JSON standard 6 and created in the events.php file in the source code.

6http://www.json.org/

Page 60: Technische Universiteit Eindhoven Technische Informatica

54 CHAPTER 4. IMPLEMENTATION

markersmarker

nameobj

desctype

Figure 4.3: Structure of Google Maps XML data file

It contains the variable dateTimeFormat, which is set to iso8601 and defineshow the dates are formatted. Next, it contains an the variable events, a twodimensional array which contains all objects that need to be added to the timeline. Each row contains the following variables.

• start: The start time of the period, formatted as yyyy[-mm[-dd]].

• end: The end time of the period, formatted as yyyy[-mm[-dd]].

• isDuration: Defines if the period in question is the exact period, or if theperiod is an approximation. If it is the former, the value should be false,otherwise it is true. This value changes how the period bar is plotted onthe time line.

• title: The title that will be shown on the bar. This usually is one or moretags belonging to the object.

• description: This is a small text that will be shown if the user doubleclicks a period bar to open a detail view. It contains a description of theobject in question. When clustered, it contains a list of all objects belongingto this cluster.

• link: A detail view title can contain a link. In CHI Explorer, this link willredirect the user to a detail view of the object in question.

• image: A detail view can contain a thumbnail of the object in question.

4.3.3 Google Maps

CHI Explorer uses the Google Maps API to plot search results on a map. To sup-ply Google Maps with this information, we have generated latitude and longitudecoordinates of all known locations as described in section 4.2.2. CHI Explorerwill generate an XML output containing all objects relating to the current searchrequest. The structure of this file is shown in figure 4.3.3.

The root node is called markers, and contains a marker node for each specificlocation in CHI Explorer. The label of this location is stored in name. Eachmarker contains a number of obj tags, representing the objects that belong to

Page 61: Technische Universiteit Eindhoven Technische Informatica

4.3. PHP 55

Figure 4.4: Time line without clustering

that location. This obj node has an attribute id, which contains the object’sidentifier. The obj node contains a desc node, containing the description of anobject, and a type node, containing the type of the object, such as Foto or Video.

4.3.4 Result Clustering

If every object gets their own bar in the time line interface, it becomes hard todifferentiate between periods, as can be seen in figure 4.4. To make this kind ofinterface more readable, CHI Explorer gives users the ability to cluster objectswith the same period in one bar on the time line. This results in a time line asshown in figure 4.5.

This clustering process is achieved by sorting all items in a tree hierarchy.CHI Explorer uses the Tree implementation of R. Heyes7. In this hierarchy, wehave different variables on different levels of the tree. These levels are describedbelow from the highest level to the leaf nodes.

• period: At this level the period is defined. Here, a period is formatted as7http://www.phpguru.org/static/tree.html

Page 62: Technische Universiteit Eindhoven Technische Informatica

56 CHAPTER 4. IMPLEMENTATION

Figure 4.5: Time line with clustering

yyyy[−mm[−dd]]*yyyy[−mm[−dd]], where the first date is the start of theperiod and the second date the end of the period

• location: OTPTIONAL; If a user wants to cluster the objects on theirlocation tags, the next level contains the objects location tag. If the parentof a location is a city, this city’s location tag will be added first. Then theobject’s location tag will be added. If an object does not contain a locationtag, the tag unknown will be added in its place.

• tags: OPTIONAL; If the user wants to cluster the objects on their contenttags, the next levels contain these tags in alphabetical order.

• identifier: Finally, the leaf nodes of the tree consist of the object’s identi-fier.

This tree is then used to generate the data file, as described in section 4.3.2.Event objects are created according to the top nodes in this tree. The lower nodesare used to group objects with the same tags together in the description of thatevent, as can be seen in figure 4.5. CHI Explorer can cluster on other variables

Page 63: Technische Universiteit Eindhoven Technische Informatica

4.3. PHP 57

by adding them to the tree in order of importance, and adding implementationto convert these new branches to the data file.

4.3.5 VideoLan

CHI Explorer uses the VideoLan Manager software8 to host streaming video.Videos can be added to this system via a configuration file or using a telnetconnection to the server. A video stream server can be set up using the commandline as shown in appendix A.3. This will also load the configuration file nameRHCe.conf, which is shown in de section above this appendix. We will discussone item of this configuration file here.

new 61281 vod enabled

setup 61281 input

"F:\Movies\Stadsjournaal_Eindhoven\Stadsjournaal_Eindhoven_1957.mp4"

These lines will create a video stream identified by 61281, which is the videoidentification string in the RHCe collection. The term vod ensures this streamwill use the Video on Demand technology, as explained in section 2.3.2. Thesecond line sets up a local video file to be streamed when this item is requested.This request can be made using the following URL.

rtsp://localhost:5554/61281

Localhost is the IP address of the video stream server, and 5554 is the portnumber as assigned in the command line. 61281 is the item name of the video tobe streamed. Within CHI Explorer, this should be the video identifier.

To display the videos embedded within a web page, we use the VideoLanPlugin. This way, we can control which video is streamed in code, as well as atwhat time the video should start. CHI Explorer uses two JavaScript functions tocontrol this.

function doGo(targetURL, starttime);

This function is used to start a specific video stream. targetURL contains theURL in the format as shown above. Starttime is the time at which the videoshould start, in seconds. If this value is 0, the video will start at the beginning.

function gotoTime(hours, mins, sec, msec, rel);

This function is used to jump to a specific point in time in a video that is alreadyrunning.hours, mins, sec and msec are all integers, signifying hours, minutes,seconds and milliseconds, respectively. rel is a flag, designating the time to berelative to the current point in the video if set to true. If the time is relative, thetime integers may be negative, to jump back in time in the video.

8http://www.videolan.org/

Page 64: Technische Universiteit Eindhoven Technische Informatica

58 CHAPTER 4. IMPLEMENTATION

4.3.6 Query Encapsulation

To make data gathering simpler in the CHI Explorer code, we have encapsulatedmost of the queries in PHP functions. These functions are put into PHP classesto distinguish between Sesame and MySQL queries, increasing code readability.In the following section, we will discuss the result of these functions.

Sesame queries

Sesame queries are gathered in the serql queries object in the functions.php file.

function objectData($id="",$search="", $dates=Array());

This function returns a search query. id may contain a specific object identifier,making this function generate information for one object only. If this value isgiven, the other variables are discarded. search may contain the user’s searchquery. dates may contain a period definition, using the following indexes.

• sj: starting year

• sm: starting month

• sd: starting day

• ej: end year

• em: end month

• ed: end day

This function will return a two dimensional array, where each row is an objectthat belongs to the given criteria. Each row has the following indexes.

• id: object identification

• beschrijving: description of the object, if available

• soort: type of the object, e.g. Foto, Video, Bookmark

• locatie: location relevant to the object, if any

• plaats: city relevant to location, if known

• lat, long: latitude and longitude coordinates of location, if known

• sj, sm, sd, ej, em, ed, duration: period of object, if any, using thedefinition as used in the function AddPeriod in section 4.2.1.

function objectType($id);

Page 65: Technische Universiteit Eindhoven Technische Informatica

4.3. PHP 59

This function returns a string defining the type of a certain object, such as Foto

or Video. id is a string containing the objects identifier.

function objectTags($id);

This function retrieves all semantic tags that are used with a certain object,excluding periods and locations. id contains the object identifier.

This function returns a two dimensional array, where each row signifies a tagused in this object. Each row has the following indexes.

• waarde: tag label

• concept: tag name in Sesame repository

• approve: array of user names who have approved this tag. If no usersapproved this tag, this array will be empty.

• reject: array of user names who have disapproved this tag. If no usersdisapproved this tag, this array will be empty.

function tagName($value, $type="tha:Trefwoord");

This function returns all concepts in the Sesame repository, which contain thesame substring. value contains the substring to search for. type may contain theconcept type to search for. By default this function searches for semantic tags.

This function returns a two dimensional array, where each row signifies a tagused in this object. Each row has the following indexes.

• tag: tag name in Sesame repository

• waarde: tag label

• search tag: substring used to find these tags.

This function is used to communicate with the matchingcomponent software asdescribed in section 4.3.8.

function tagRelations($tag, $filter=array());

This function returns all relations of a certain tag to the other tags in the CHIExplorer system. tag is the tag name in the Sesame repository. filter is an arrayof strings, containing all relation types that need to be returned. This can be anycombination of smallereTerm, bredereTerm, wasVroeger, omVat, gerelateerdAan

and zieOok. More information about these relations can be found in the definitionof the tagFilter variable in section 4.3.

This function returns a two dimensional array, where each row is a relationto another tag. Each row has the following indexes.

Page 66: Technische Universiteit Eindhoven Technische Informatica

60 CHAPTER 4. IMPLEMENTATION

• dir: determines the direction of the relation between specific tag and foundtag:

– obj: this row is the requested tag itself

– to: The found tag is the object of the relation.

– from: The found tag is the subject of the relation.

• rel: name of the relation in the Sesame repository.

• obj: name of the found tag in the Sesame repository.

• name: label of the found tag.

• type: type of the found tag in the Sesame repository.

function getBookmarks($id);

This function returns all known fragments of a certain object. id contains theobject identifier. The function returns a two-dimensional array, where each rowcontains a fragment. Each row consists of the following indexes.

• id: this fragment’s identification.

• su, sm, ss: integers containing the starting hour, minute and second ofthe fragment, respectively.

• eu, em, es: integers containing the end hour, minute and second of thefragment, respectively.

function bookmarkTags($id);

This function returns all tags belonging to a specific video. This includes all tagsthat belong to any fragment of this video, if they exist. id contains the objectidentification of the video in question. The function returns a two-dimensionalarray, where each row is a separate tag. The array contains the following indexes.

• soort: This variable contains the type of object the tag belongs to. In CHIExplorer, this is either Video or Bookmark.

• trefwoord: The concept name of the tag found in the Sesame repository.

• waarde: The label of the tag found.

• starth, startm, starts: integer value containing the starting hour, minuteand second of the fragment, respectively, if the tag belonged to a fragment.

• stoph, stopm, stops: integer value containing the end hour, minute andsecond of the fragment, respectively, if the tag belonged to a fragment.

Page 67: Technische Universiteit Eindhoven Technische Informatica

4.3. PHP 61

function bookmarkInfo($id);

This function returns information about a specific fragment. id is a string contain-ing the identification of the fragment. This function returns a two-dimensionalarray, containing one row. This row has the following indexes.

• videoID: contains the identifier of the video the fragment belongs to.

• su, sm, ss: integer value containing the starting hour, minute and secondof the fragment, respectively.

• eu, em, es: integer value containing the end hour, minute and second ofthe fragment, respectively.

function getClosePeriods($dates);

This function returns all known Intervals that are known in the Sesame repository,that fall within a certain period. dates is an array containing the period inquestion, formatted in the same way as the dates variable in the objectData

function. The function returns a two-dimensional array, where each row containsa period. Each row contains the following indexes.

• su, sm, ss: integer value containing the starting hour, minute and secondof the period, respectively.

• eu, em, es: integer value containing the end hour, minute and second ofthe period, respectively.

• duration: duration value of the Interval in question, if no end date is given.As shown in section 4.2.3, this DurationDescription is either OneYear,OneMonth or OneDay.

function filterType($items, $type);

This function returns a list of concepts of a certain type from the Sesame reposi-tory. Items is an array of strings, containing a list of labels the searched objectsshould confer to. type is a string containing the concept type the searched objectsshould be a part of. The function returns a two-dimensional array, where eachrow is a specific concept. Each row contains the following indexes.

• tag: the concept name in the Sesame repository.

• waarde: the label of the concept.

function getPeriods($id);

Page 68: Technische Universiteit Eindhoven Technische Informatica

62 CHAPTER 4. IMPLEMENTATION

This function returns all periods and the ratings of a specific object. id is astring containing the identifier of the object in question. The function returns atwo-dimensional array, where each row is a period, either approved by the RHCe,or proposed by users. Each row contains the following indexes.

• concept: the concept name of the period in question in the Sesame repos-itory.

• soort: the type of the object that is being searched.

• sj, sm, sd: integer value containing the starting hour, minute and secondof the period, respectively.

• ej, em, ed: integer value containing the end hour, minute and second ofthe period, respectively.

• duration: duration value of the Interval in question, if no end date is given.As shown in section 4.2.3, this DurationDescription is either OneYear,OneMonth or OneDay.

• approve: array containing all user names who have approved this period.If no users approved this period, this array is empty.

• reject: array containing all user names who have disapproved this period.If no users disapproved this period, this array is empty.

function getCloseLocations($location);

This function returns a tree hierarchy of a specific location as described in theCHI data structure[3, p.36]. location contains a comma-separated list of locationlabels, which are interpreted as root nodes in the resulting tree. The functionreturns an array representing the tree, as built by the buildTree function.

function BuildTree($term, &$children);

This recursive function generates an array representing a tree structure. term

contains the identifier of the location that is currently worked on. children con-tains an array, where each index is the concept name of a location known inthe Sesame repository. This index points to an array, containing the followingindexes.

• beschrijving: a longer name of the location, if any.

• location: the label of the location.

• children: an array, containing all locations with a smallereTerm relationto the location in the index.

Page 69: Technische Universiteit Eindhoven Technische Informatica

4.3. PHP 63

The function generates an array with the following indexes.

• caption: the label of the current location.

• concept: the concept name of the current location.

• children: an array, where each item is generated using this procedure usingthe smallereTerm locations of the current location.

function getLocations($id);

This function returns all locations and their ratings for a specific objectt. id

contains the searched object identifier. The function returns a two-dimensionalarray, where each row represents a location related to the searched object. Eachrow contains the following indexes.

• concept: concept name of the location.

• soort: type of the searched object.

• locatie: label of the location.

• plaats: city where the location belongs to, if any is known.

• approve: array containing all user names who have approved this location.If no users approved this location, this array is empty.

• reject: array containing all user names who have disapproved this location.If no users disapproved this location, this array is empty.

function getUnknownLocations($type);

This function returns all location of a specific type. type is an array of all typesthat are requested. If this variable is empty, it will return all locations that havenot been given a specific type. In CHI Explorer, this will be the values that havebeen proposed by the users. The function returns a two-dimensional array, whereeach row represents a location. Each row has the following indexes.

• key: the index of the current item in the array.

• LocStr: the label of the location.

• waarde: the longer name of the location, if any.

• parent: the label of this location’s bredereTerm.

This function is used during template generation.

Page 70: Technische Universiteit Eindhoven Technische Informatica

64 CHAPTER 4. IMPLEMENTATION

MySQL Queries

MySQL queries are used to extract and update data from the Joomla! and SMFdatabase. It is mostly used to generate the forum links, as described in section4.3.9. These functions are bundled in the sql queries PHP class and contains thefollowing functions.

function getObjectCategory();

This function returns the index of the forum board containing the Object dis-cussion threads. For more information about the forum structure, see section4.1.3.

function getObjectBoard($cat, $name);

This function returns the index of the forum board that contains all discussionthreads about a certain object. cat is the integer index of the board containingthe searched board. name is the name of the object board that is being searched.If the board does not exist, it will be created. The function returns the integerindex of the requested board. For more information about the forum structure,see section 4.1.3.

function getTypeBoard($obj, $name);

This function returns the index of the forum board that contains all discussionthreads of a certain type about a certain object. obj is the integer index of theboard about the object. name is the name of the searched board. The functionreturns the integer index of the searched board. If the board does not exist, it willbe created. For more information about the forum structure, see section 4.1.3.

function getValueTopic($board, $name, $objectID);

This function returns the index of the forum thread of a certain discussion. board

is the integer index of the board containing the discussions of a certain type abouta certain board. name is the concept name of the tag that is being discussed.objectID is the object identifier. The function returns the integer index of thethread in question. If the thread does not exist, it will be created. For moreinformation about the forum structure, see section 4.1.3.

4.3.7 Tag Representation

Tags in images can be shown in a static list. Such an interface is not as useful invideos, especially when we consider fragments. Fragments allow videos to havedifferent tags at different points in time. Our interface needs to reflect this. CHI

Page 71: Technische Universiteit Eindhoven Technische Informatica

4.3. PHP 65

Explorer does this, by updating this list of tags on the fly, as described in section3.3.3.

To do this, CHI Explorer uses AJAX to ask for a new list of tags each secondwhile a video is playing. This allows the system to find the correct tags at themoment of play, as well as collect new tags, that have been added by other usersat this time. The function that does this is encoded in the tags.php file and isdescribed below.

function tagTable($id, $time);

This function generates the tag list as described in section 3.3.3. id contains theobject identifier of the video in question. time is the current time of the video,formatted as [uu:]mm:ss. The function formats the list, so that currently relevanttags are located at the top of the list. Clicking a tag label will redirect the userto the tag cloud of that tag. Clicking a magnifying glass icon will move the videoto the point in time where this tag is relevant.

4.3.8 TagSuggestion component

CHI Explorer relies on a separate piece of software to find loose relations betweentags when proposing tags to an object. This system is called TagSuggestion and iscreated by G. Ketelaars[5]. It works by finding tags in the CHI Explorer system,ignoring any spelling errors the user might have made, and checking which othertags are used in conjunction with them frequently in other objects.

For example, when a user wants to add the tag World War II, it will searchthe system for objects that are tagged thusly, and checks what other tags areused in those tags. It might return tags like soldier, bombing and liberation.There are no direct links in the CHI Explorer repository between these tags, butseveral objects are tagged with a combination of these tags.

CHI Explorer uses several PHP functions to communicate with this JAVAsystem, which are described below.

function OpenMatch();

This function creates and initializes the PHP interface to the JAVA system. Itneeds to be called before any communication can be made between CHI Explorerand TagSuggestion.

function AskMatch($array);

This function makes a search request to TagSuggestion. Array is a two-dimensionalarray, where each row is a suggested tag. Each row has the following indexes.

0. This is the label of the tag. TagSuggestion will first find tags that have(almost) the same labels as a starting point to the search.

Page 72: Technische Universiteit Eindhoven Technische Informatica

66 CHAPTER 4. IMPLEMENTATION

1. This value is a real value containing the weight that is given to this tag.The weight of a tag is related to the number of users that have approved ordisapproved of it. Calculation of this weight value is shown in the weight

variable in section 4.3.

The function returns a two-dimensional array, where each row is a tag proposedby TagSuggestion. Each row has the following indexes.

0. This is a string containing the concept name of the found tag.

1. This real value denotes the reliability of this tag to the original tag asestimated by TagSuggestion. This value lies between 0 and 1.

2. This is a string containing the tag used by TagSuggestion to find this tag.

function FeedbackMatch($array);

This function is used to send feedback back from the user to TagSuggestion, inorder to improve the search algorithm. Feedback will influence later reliabilityvalues. array is a two-dimensional array, where each row is a proposed tag. Eachrow has the following indexes.

0. This is a string containing the tag used by TagSuggestion to originally findthis tag.

1. This is a string containing the concept name of the found tag.

2. This value contains a character depicting a user’s opinion of this tag incombination to his original tag. It can be one of three values.

• p: The user agrees to the link between this tag and his original tag.

• n: The user does not agree to the link between this tag and his originaltag.

• i: The user did not comment on the link between both tags.

4.3.9 Forum Links

Forum links are generated using the functions available in the SMF system. Twointerface functions have been written in CHI Explorer, as described below.

function gotoForumDescription($id);

The user is allowed to comment on the descriptive text of an object. This func-tion redirects a user to the board where he is allowed to start and respond tothreads with these subjects. id contains the identifier of the object to which thedescription belongs to. If this board does not yet exist, it will be created.

Page 73: Technische Universiteit Eindhoven Technische Informatica

4.3. PHP 67

function gotoForum($id, $type, $value);

This function redirects a user to the forum thread where the connection betweena tag and an object is discussed. id contains the object identifier. type containsthe type of tag that is being discussed. In CHI Explorer, this will be one of tref-

woorden, datering or locatie. value contains the tag in question. If a discussionthread does not yet exist, it will be created by the system.

Page 74: Technische Universiteit Eindhoven Technische Informatica
Page 75: Technische Universiteit Eindhoven Technische Informatica

Chapter 5

Use Case

This section shows the actions a user can take using CHI Explorer. It will focuson the steps a user takes when searching a specific object. Next, it will show howthis object can be tagged. Finally, a proof of concept is given how a moderatorinterface can be used, as will be mentioned in section 6.3.1, which will includeproposed extensions to CHI Explorer.

The actions described here are the standard operations all users can use in CHIExplorer. The system has been optimized to handle these operations efficiently.The third action, the moderator interface, contains a small proof of concept whatkind of actions a moderator could be allowed to perform in future extensions ofCHI Explorer.

69

Page 76: Technische Universiteit Eindhoven Technische Informatica

70 CHAPTER 5. USE CASE

Figure 5.1: RHCe home page in Joomla!

5.1 Search request

The first thing a user sees when he logs onto the RHCe page, will be the Joomla!home page, displayed in figure 5.1. Here RHCe professionals can post informationabout objects that have been newly added to the collection, or other news worthyevents which could interest the users. In the future, Joomla! allows the possibleto allow certain users to post their own research topics, which could generatemore interest in the objects in question.

Page 77: Technische Universiteit Eindhoven Technische Informatica

5.1. SEARCH REQUEST 71

Figure 5.2: CHI Explorer main interface

When the user navigates to CHI Explorer by clicking the link at the top ofthe page, he is presented with the search interface, including a list of the first 25items in the repository, as shown in figure 5.2. Here he can type a search queryinto the input box. Using the radio buttons below the input box the user canchoose what type of view the search results should be organized in.

Page 78: Technische Universiteit Eindhoven Technische Informatica

72 CHAPTER 5. USE CASE

Figure 5.3: result list for the query vehicle

In this example, the user searches for all objects concerning the tag vervo-

ersmiddel (vehicle). A list of all objects concerning this tag is returned, as shownin figure 5.3. The first item in this list is a video fragment, while the second oneis an image. Other views of this search results are available, such as the mapview (figure 5.4) or the timeline view (figure 5.5).

Page 79: Technische Universiteit Eindhoven Technische Informatica

5.1. SEARCH REQUEST 73

Figure 5.4: result map for the query vehicle

Page 80: Technische Universiteit Eindhoven Technische Informatica

74 CHAPTER 5. USE CASE

Figure 5.5: result timeline for the query vehicle

Page 81: Technische Universiteit Eindhoven Technische Informatica

5.1. SEARCH REQUEST 75

Figure 5.6: Detail view of a video fragment

When the user opens the found video fragment, CHI Explorer opens a detailview, as shown in figure 5.6. This video will begin at the start of the fragmentin question, which is 11:10 in this example. The user has full control over thevideo, so he is able to check out other parts of this video, if he so desires.

Page 82: Technische Universiteit Eindhoven Technische Informatica

76 CHAPTER 5. USE CASE

Figure 5.7: User has proposed the Eindhoven tag to the left, and the addedEindhoven tag to the right

5.2 Tagging

The user can add location tags by clicking the locatie link just below the controlbuttons in the detail view as shown in figure 5.6. This will open the location tagform for this specific fragment. Here the user can add the tag Eindhoven to thisitem, as shown in figure 5.7 on the left side. When the user lets CHI Explorerprocess his proposition, the user sees the updated tag form in figure 5.7 on theright side.

Page 83: Technische Universiteit Eindhoven Technische Informatica

5.2. TAGGING 77

Figure 5.8: Forum discussion thread

If the user is logged in and clicks on the tag label in this form, he will bedirected to the forum thread that discusses the relation of this tag with thisvideo fragment, as shown in figure 5.8. If no such thread exists, a new one willbe generated. This thread contains a link to the object in question.

Page 84: Technische Universiteit Eindhoven Technische Informatica

78 CHAPTER 5. USE CASE

Figure 5.9: New locations in moderator interface

5.3 Moderating

In this use case we will discuss a proof of concept how a moderator interface couldbe implemented. I this example, the location tag Breda, a city in the provinceof North Brabant in the Netherlands, has been added to CHI Explorer. Whena moderator logs in, he can see this new tag as shown in figure 5.9. Using thecombobox located next to the location name can be used to select the tag type,in this case plaats (city).

Page 85: Technische Universiteit Eindhoven Technische Informatica

5.3. MODERATING 79

Figure 5.10: Placement choice interface of a city within the location hierarchy

Figure 5.11: Cities in moderator interface

When this field is added, the link located after this combobox can be used toinsert the current tag into the location hierarchy. A form will pop up, as shownin figure 5.10, allowing the moderator to select one of the overarching concepts inthe location hierarchy. In this case, the moderator selects Noord-Brabant, sinceBreda is a city in the province of North Brabant. The resulting list of knowncities can be seen in figure 5.11. Existing location types and hierarchy links ofknown locations can be modified in the same way.

Page 86: Technische Universiteit Eindhoven Technische Informatica
Page 87: Technische Universiteit Eindhoven Technische Informatica

Chapter 6

Conclusion

In this chapter we will discuss the execution of the CHI Explorer project. Section6.1 will check if CHI Explorer has achieved the objectives set in the beginningof this project. Section 6.2 describes some issues that came up during the im-plementation of this project. Finally, section 6.3 will propose several possibleexpansions on CHI Explorer as it is now.

6.1 Conclusion

As mentioned in section 1.3, the CHI explorer project consists of three parts:

• Disclose RHCe videos to the users, allowing them to search through thedataset.

• Integrate other data formats; allow users to find images and videos usingthe same search query and allow for the inclusion of other data formats ata later date.

• Implement a system to generate new metadata for existing objects.

In the next sections we will discuss how CHI Explorer solves these problems.

6.1.1 Disclose video

Users are now able to search the RHCe repository for videos and view them inCHI Explorer, including fragments of videos as explained in sections 2.3, 3.3.3and 4.3.5. These sections describe the use of codecs, the video interface designand the integration of the video component in CHI Explorer. In CHI Explorer,users can search for tagged videos or video fragments and stream a found video.

81

Page 88: Technische Universiteit Eindhoven Technische Informatica

82 CHAPTER 6. CONCLUSION

Users have complete control of a video and can pause, play, rewind or fast forwarda video at will.

The introduction of fragments also allow users to find and tag specific sceneswithin videos. If a user selects a video fragment, the correct video will be openedand will start playing at the correct time interval of the specific fragment. Thisguides users directly to the parts of the video they are interested in.

6.1.2 Data integration and expansion possibilities

The data structure of CHI Explorer,as explained in section 3.1, uses extensions ofthe original CHI system to enable both videos, video fragments and images to betreated similarly. This allows CHI Explorer to search these different data types inone go and aggregate the search results in the same list. This data structure canbe extended with other data types in the same manner, based on the Archiefobject

concept. Specific detail views will need to be implemented to display this newinformation correctly. Alternatively, new tag types can be added by extendingthe Term concept. New search result interfaces could be added based on thesenew tags as has been done in CHI Explorer in sections 3.3.2 and 3.3.2. This canbe done easily due to the modular design of all functions within CHI Explorer.

6.1.3 metadata generation

CHI Explorer implements a tagging system to enlist the users in adding metadatato the RHCe repository. This allows us to improve search results, by immedi-ately using these tags in following search requests. Users are allowed to rate thetags added by other users, enabling RHCe professionals to extract tags that areapproved of by many users and add them as RHCe approved tags. Users arealso able to add video fragment definitions, granting more efficient search results.These fragments are returned in search requests, enabling users to jump to thespecific point in a video where his or her search request is relevant.

Recording which users add and rate which tags allows RHCe professionalsto filter out contributive members and allow them more rights in the future. Italso allows them to filter out malicious members and ban them from the site andremove any metadata added by them. The systematic approach to the storageof metadata allow automated processes to extend this dataset in the future.

6.2 Issues

This section analyses the CHI Explorer system and its subsystems. Section 6.2.1details the issues that could arise when it is used in a real multi-user system.Section 6.2.2 explains a minor deviation in the tag weights in the tagging interface.

Page 89: Technische Universiteit Eindhoven Technische Informatica

6.2. ISSUES 83

6.2.1 Multi-user test

Due to time limitations, a multi-user evaluation test could not be performed.This test can be set up by deploying 5 computers on a small LAN network inconnection with a dedicated server running CHI Explorer. This test will haveseveral goals:

1. Test the bandwidth usage of video streaming when multiple users use thesystem. This includes the streaming of multiple videos to multiple users, aswell as the streaming of the same video to multiple users. The first optionwill test the general bandwidth usage of VideoLAN, while the second testchecks the relation between the file system of the server and the streamingcapabilities of VideoLAN.

2. Test the stability of the tagging system when multiple users tag the sameobject or rate the same tag on the same object. This test checks the Java-Sesame interface when used by multiple users at once. This interface shouldhandle all requests as atomic actions to ensure the correctness of the infor-mation contained in the Sesame repository. The connection between PHPand Java is implemented at the moment using a HTTP connection.

If the system slows down during this test, it can be solved by implementinga local connection to a static Java component on the server. This wouldrequire a rewrite of this part of the Java interface as described in section4.2.1. No changes are needed in the interface in section 4.2.1 or any codein other parts of CHI Explorer.

6.2.2 Weight differences in tag proposals

As explained on page page 53, the weight of tags belonging to an object are cal-culated when the tagging form is displayed to the user. Weights of tags proposedby users are calculated according to the number of users that have approved thistag in relation to the number of users that disapproved this tag. It is possible forthis list of weights to become out of date when a user has this form open for along time, while other users rate the tags relative to this object. This could leadto small changes in the list of tags proposed by CHI Explorer to the user, whenhe eventually chooses to add a tag.

To solve this problem, the tag addition forms are linked from the objectdetail page and therefore only opened when a user really wants to add or rateinformation. All tags are shown on the detail page, but changes by a user requireshim or her to open the tag form. This limits the amount of time a user will keepthe tag form open.

The effect of the weight is also not extensive. These values are used in theTagSuggestion component to order the found tags. This means that even in the

Page 90: Technische Universiteit Eindhoven Technische Informatica

84 CHAPTER 6. CONCLUSION

case of the highest difference possible between weights, from 0.75 to 0.85, the onlydifference in the feedback given to the user would be a tag ordered as number 12when it should have been number 5. In both cases the user will be shown this tagand in both cases the user will be able to add this tag to the object in question.

6.3 Future Work

CHI Explorer as it stands now is a system that lets users search the RHCerepository of images and videos and tag these items in the categories of period,location and contents. In this section we will discuss several additions to thissystem.

6.3.1 Moderator actions

Users can input many tags into CHI Explorer, which needs to be maintained.Some actions are already included in the system, for example, the originatinguser of all tags and fragments is being recorded, and tags have user ratings,which define the consensus of the community about a certain tag.

Other actions need to be implemented to increase the precision of searchrequests. Though tags that have been added by users are already used in sub-sequent search requests, these tags have no relations to the other tags in therepository. For example, if a user adds the tag Mercedes into the system, it willnot have a relational link to the tag car. So, a user searching for the tag car willnot find any objects that have been tagged with the tag Mercedes.

An extension to CHI Explorer would be a moderator interface to make thesechanges to the new tags that are added to the system. This would include seman-tical relations such as in the car example above, but also hierarchical relationsbetween locations. Another manipulation of the tag structure would be the abil-ity to merge different tags with a similar meaning. In this case, tags such ascar and automobile can be merged into one concept. CHI Explorer needs to putthese tags under the same concept, while keeping all links to objects intact. Itwill also needs to be able to assign one of these tags as the preferred label tothis concept, while keeping the other label in memory to be able to find thisconcept in search request. This could be solved by using the SKOS system, usingprefLabel to define the main tag and altLabel to record any other tags. It alsohas the ability to record frequent misspellings of these tags in hiddenLabel.

Initially, RHCe professionals can be used to make these changes to the tagrepository, as long as the user base is still small. When the user base grows, someusers can be found having a large amount of correct tag additions. These canthen be promoted to a Moderator role, allowing them to make these changes.It is advisable to make changes made by these Moderators to not be permanentimmediately, but need to be approved by a RHCe professional at first. When

Page 91: Technische Universiteit Eindhoven Technische Informatica

6.3. FUTURE WORK 85

the user base grows even bigger, it is advisable to implement a system thatautomatically searches the system for tags that are not connected to other tags,to ensure no loose tags float around in the system. These type of tags wouldcomplicate the search system, since they do not take advantage of the semanticrelations that should be made between the concepts.

6.3.2 Adding tag dimensions and object types

CHI Explorer allows users to tag objects according to the subject of the presentedscene, the location it represents and the period in which it is taken. Thesedimensions define the most common subjects a user might want to search for.Other dimensions might be added by expanding the tag data structure.

An example of this would be the addition of personal names. In the currentCHI Explorer system, these names are used as subject tags, but these couldbe more specifically defined when other object types are added to the system.For example, the RHCe archives include the birth and death registers of severalmunicipalities. When an object is tagged with a personal name in this system,it could be used to date an object, especially when a global age of the person inquestion can be given. This can be used in the creation of family trees, althoughit should be taken into account that even if two persons in the same time periodhave the same name, this does not necessarily mean they are the same person.This system can be used as a helpful search tool, but it can not automaticallygenerate the required family trees.

As noted above, other object types might be added to the system as well.These could be books, magazines, newspapers, slides or the aforementioned mu-nicipality archives. All of these objects can use the subject, location and periodtagging interface. Some of them can even use an adjusted form of the fragment

object. For example, a book fragment can be defined by a page and line number.

6.3.3 Search options

CHI Explorer allows users to search the RHCe repository for all objects, usingall tags known. It might be useful to allow users to refine this search system.For example, some users might only want to search objects using tags that havebeen approved by RHCe, to increase the reliability of the search results. Otheroptions could allow users to search for specific object types, for example, if he issearching for images in particular.

Changes such as this can be added simply by adding the necessary interfacecontrols to the page and changing the search queries that are used. Another wayto use the users to generate metadata could be by keeping track of which tagsare used by users in conjunction in search requests. If certain tags are used inconjunction repeatedly by multiple users, it can be assumed that these tags are

Page 92: Technische Universiteit Eindhoven Technische Informatica

86 CHAPTER 6. CONCLUSION

related and this suggestion could be passed on to an RHCe professional.

Page 93: Technische Universiteit Eindhoven Technische Informatica

Bibliography

[1] M.Z. Visharam D. Singer. Mpeg-4 file formats white paper. Technical report, Inter-national Organization for Standardization, 2005. [cited at p. 21]

[2] Convergent Information Systems Division. Digital media file types: Survey of com-mon formats. Technical report, National Institute of Standards and Technology, 2002.[cited at p. 20]

[3] G.H.J. Dorssers. Ontsluiting van cultuurhistorische informatie: de chi-navigator:datamodel en implementatie. Technical report, Technische Universiteit Eindhoven,Regionaal Historisch Centrum Eindhoven, 2006. [cited at p. 13, 14, 16, 24, 51, 62]

[4] F.T.M. Kamzol. Ontsluiting van cultuurhistorische informatie: de chi-navigator:datamodel en gebruikerstest. Technical report, Technische Universiteit Eindhoven,Regionaal Historisch Centrum Eindhoven, 2006. [cited at p. 13]

[5] G.J.A.M. Ketelaars. From tag to concept. Technical report, Technische UniversiteitEindhoven, 2008. [cited at p. 29, 65]

87

Page 94: Technische Universiteit Eindhoven Technische Informatica
Page 95: Technische Universiteit Eindhoven Technische Informatica

Appendices

89

Page 96: Technische Universiteit Eindhoven Technische Informatica
Page 97: Technische Universiteit Eindhoven Technische Informatica

Appendix A

Code Snippets

A.1 Google Maps Location Collection

<?php

require("config.php");

session_start();

require_once(’functions.php’);

$locArray = array();

$file = ’’;

function addCoords($address)

{

global $locArray;

global $file;

$xml = "http://maps.google.com/maps/geo?q=$address&output=xml&key=[GoogleMapKey]";

$xmlDoc = new DOMDocument();

$xmlDoc->load($xml);

if ($xmlDoc)

{

$response = $xmlDoc->getElementsByTagName(’Response’);

if ($response->length != 0 )

{

$placemark = $response->item(0)->getElementsByTagName(’Placemark’)->item(0);

if ($placemark)

{

$point = $placemark->getElementsByTagName(’Point’)->item(0);

$str_coord = $point->getElementsByTagname(’coordinates’)->item(0)->

childNodes->item(0)->nodeValue;

$values = explode(’,’, $str_coord);

91

Page 98: Technische Universiteit Eindhoven Technische Informatica

92 APPENDIX A. CODE SNIPPETS

$locArray[$address][’lat’] = $values[1];

$locArray[$address][’long’] = $values[0];

}

}

}

}

$serql = ’select distinct locatieTm, locatie, plaats ’;

$serql.= ’from ’;

$serql.= ’{obj} tha:bevatTerm {locatieTm}, ’;

$serql.= ’{locatieTm} rdf:type {tha:Locatie}; ’;

$serql.= ’tha:termWaarde {locatie}, ’;

$serql.= ’[ {locatieTm} tha:bredereTerm {locatie_plaats} rdf:type {tha:Plaats}, ’;

$serql.= ’[ {locatie_plaats} tha:termWaarde {plaats} ] ’;

$serql.= ’] ’;

$serql.= ’using namespace ’;

$serql.= ’tha = <http://www.rhc-eindhoven.nl/tha#> ’;

serql_query($serql);

while ( $result = serql_fetch_array() )

{

$str = $result["locatie"];

if ($result["plaats"] != ’’) { $str.= ’, ’.$result["plaats"]; }

$locArray[$str][’locatie’] = $result["locatieTm"];

}

foreach ($locArray as $key=>$value)

{

addCoords($key);

}

$doc = new DOMDocument(’1.0’);

$root = $doc->createElement(’rdf:RDF’);

$root = $doc->appendChild($root);

foreach ($locArray as $key=>$value)

{

if ( array_key_exists(’lat’, $value) )

{

$node = $doc->createElement(’Locatie’);

$node->setAttribute(’rdf:ID’,$value[’locatie’]);

$root->appendChild($node);

$coord = $doc->createElement(’bevatLat’);

$node->appendChild($coord);

$val = $doc->createTextNode($value[’lat’]);

$coord->appendChild($val);

$coord = $doc->createElement(’bevatLong’);

Page 99: Technische Universiteit Eindhoven Technische Informatica

A.2. W3C TIME ONTOLOGY REWRITE 93

$node->appendChild($coord);

$val = $doc->createTextNode($value[’long’]);

$coord->appendChild($val);

}

}

$file = fopen(’locaties.xml’, ’w+’);

fwrite($file, $doc->saveXML());

fclose($file);

?>

A.2 W3C Time Ontology rewrite

<?php

require_once ’../sesame-int.php’;

class TimeConverter

{

/** var containing all current periods */

protected $PeriodData;

/** shortcut to generated file */

protected $file;

protected $intervals;

protected $instants;

/**

* Variable defining the type of interval:

* 0: undefined

* 1: standard year/month/day interval

* 2: variable interval

*/

protected $intervalType;

protected $dateType;

protected $obj;

protected $bj;

protected $bm;

protected $bd;

protected $ej;

protected $em;

protected $ed;

function __construct()

{

$this->PeriodData = array();

Page 100: Technische Universiteit Eindhoven Technische Informatica

94 APPENDIX A. CODE SNIPPETS

$this->intervals = array();

$this->instants = array();

$this->file = false;

}

protected function w($str)

{

fwrite($this->file, $str.’ ’);

}

protected function AddStandardNodes()

{

// DurationDescriptions of standard Intervals

$this->w(’<time:DurationDescription rdf:ID="OneYear">’);

$this->w(’<time:years>1</time:years>’);

$this->w(’</time:DurationDescription>’);

$this->w(’<time:DurationDescription rdf:ID="OneMonth">’);

$this->w(’<time:months>1</time:months>’);

$this->w(’</time:DurationDescription>’);

$this->w(’<time:DurationDescription rdf:ID="OneDay">’);

$this->w(’<time:days>1</time:days>’);

$this->w(’</time:DurationDescription>’);

// Definition of undefined intervals

$this->w(’<owl:class rdf:ID="UnknownInterval">’);

$this->w(’<rdfs:subClassOf rdf:resource="http://www.w3.org/2006/time#Interval" />’);

$this->w(’</owl:class>’);

$this->w(’<UnknownInterval rdf:ID="IntervalUndefined" />’);

$this->intervals[] = "IntervalUndefined";

}

protected function CacheValues($i)

{

$this->obj = $this->PeriodData["obj"][$i];

$this->bj = $this->PeriodData["bjaar"][$i];

$this->bm = $this->PeriodData["bmaand"][$i];

$this->bd = $this->PeriodData["bdag"][$i];

$this->ej = $this->PeriodData["ejaar"][$i];

$this->em = $this->PeriodData["emaand"][$i];

$this->ed = $this->PeriodData["edag"][$i];

}

protected function GenerateName()

{

if ($this->bj == ’’ && $this->bm == ’’ && $this->bd == ’’ &&

$this->ej == ’’ && $this->em == ’’ && $this->ed == ’’)

// undefined interval

{

$ret = "Undefined";

Page 101: Technische Universiteit Eindhoven Technische Informatica

A.2. W3C TIME ONTOLOGY REWRITE 95

$this->intervalType = 0;

}

elseif (($this->bj == $this->ej &&

$this->bm == $this->em &&

$this->bd == $this->ed) ||

($this->ej == ’’ &&

$this->em == ’’ &&

$this->ed == ’’) )

// Year, month or day interval

{

$ret = $this->bj;

if ($this->bm != ’’) { $ret.= ’-’.$this->bm; }

if ($this->bd != ’’) { $ret.= ’-’.$this->bd; }

$this->intervalType = 1;

}

else

// variable length interval

{

$ret = $this->bj;

if ($this->bm != ’’) { $ret.= ’-’.$this->bm; }

if ($this->bd != ’’) { $ret.= ’-’.$this->bd; }

$ret.= ’to’.$this->ej;

if ($this->em != ’’) { $ret.= ’-’.$this->em; }

if ($this->ed != ’’) { $ret.= ’-’.$this->ed; }

$this->intervalType = 2;

}

return $ret;

}

protected function PutDuration()

{

if ($this->bd != ’’ )

{

$this->w(’<time:hasDurationDescription rdf:resource="#OneDay" />’);

} elseif ($this->bm != ’’ )

{

$this->w(’<time:hasDurationDescription rdf:resource="#OneMonth" />’);

} else

{$this->w(’<time:hasDurationDescription rdf:resource="#OneYear" />’); }

}

/**

* possible values of $instantType:

* 1: hasBeginning tag using bj, bm, bd

* 2: hasEnd tag using ej, em, ed

*/

protected function PutInstant($instantType)

{

if ($instantType == 1)

{

$prefix = ’time:hasBeginning’;

Page 102: Technische Universiteit Eindhoven Technische Informatica

96 APPENDIX A. CODE SNIPPETS

$j = $this->bj;

$m = $this->bm;

$d = $this->bd;

}

elseif ($instantType == 2)

{

$prefix = ’time:hasEnd’;

$j = $this->ej;

$m = $this->em;

$d = $this->ed;

}

else { echo(’Unknown instantType!’); }

$name = $j;

$type = ’unitYear’;

if ($m != ’’)

{

$name.= ’-’.$m;

$type = ’unitMonth’;

}

if ($d != ’’)

{

$name.= ’-’.$d;

$type = ’unitDay’;

}

if (in_array($name, $this->instants) )

{

$this->w(’<’.$prefix.’ rdf:resource="#Instant’.$name.’" />’);

}

else

{

$this->instants[] = $name;

$this->w(’<’.$prefix.’>’);

$this->w(’<time:Instant rdf:ID="Instant’.$name.’">’);

$this->w(’<time:hasDateTimeDescription>’);

$this->w(’<time:DateTimeDescription rdf:ID="Date’.$name.’">’);

$this->w(’<time:unitType rdf:resource="http://www.w3.org/2006/time#’.

$type.’" />’);

$this->w(’<time:year rdf:datatype="http://www.w3.org/2001/XMLSchema#int">’.

$j.’</time:year>’);

if ($m != ’’)

{ $this->w(’<time:month rdf:datatype="http://www.w3.org/2001/XMLSchema#int">’

$m.’</time:month>’); }

if ($d != ’’)

{ $this->w(’<time:day rdf:datatype="http://www.w3.org/2001/XMLSchema#int">’.

$d.’</time:day>’); }

$this->w(’<time:timeZone rdf:resource="http://www.w3.org/2006/timezone-world#ATZ" />’);

$this->w(’</time:DateTimeDescription>’);

$this->w(’</time:hasDateTimeDescription>’);

$this->w(’</time:Instant>’);

Page 103: Technische Universiteit Eindhoven Technische Informatica

A.2. W3C TIME ONTOLOGY REWRITE 97

$this->w(’</’.$prefix.’>’);

}

}

protected function PutInterval($i)

{

$this->CacheValues($i);

$name = ’Interval’.$this->GenerateName();

$this->w(’<Archiefobject rdf:ID="’.$this->obj.’">’);

if (!in_array($name, $this->intervals))

{

$this->w(’<inPeriodeSTD>’);

$this->w(’<time:Interval rdf:ID="’.$name.’">’);

$label = $this->bj.’-’.$this->bm.’-’.$this->bd.’,’.$this->ej.’-’.

$this->em.’-’.$this->ed;

$this->w(’<rdfs:label>[’.$label.’]</rdfs:label>’);

switch ($this->intervalType)

{

case 0:

$date = $this->bj.’-’.$this->bm.’-’.$this->bd.’,’.

$this->ej.’-’.$this->em.’-’.$this->ed;

echo("Undefined type should not be instantiated! [".$date."]\n");

break;

case 1:

$this->PutInstant(1);

$this->PutDuration();

break;

case 2:

$this->PutInstant(1);

$this->PutInstant(2);

break;

default:

echo("unknown interval type! <br>");

}

$this->w(’</time:Interval>’);

$this->w(’</inPeriodeSTD>’);

$this->intervals[] = $name;

}

else

{ $this->w(’<inPeriodeSTD rdf:resource="#’.$name.’" />’); }

$this->w(’</Archiefobject>’);

}

protected function PutPeriods()

{

$i = 0;

$length = count($this->PeriodData[’obj’]);

while ($i < $length )

{

$this->PutInterval($i);

Page 104: Technische Universiteit Eindhoven Technische Informatica

98 APPENDIX A. CODE SNIPPETS

$i+= 1;

}

}

protected function GeneratePeriods()

{

$serql =

’select obj, bjaar, bmaand, bdag, ejaar, emaand, edag ’.

’from ’.

’{obj} tha:inPeriode {period}; ’.

’tha:archiefobjectID {name}, ’.

’{period} rdf:type {tha:Periode}; ’.

’[tha:periodeDatumVroegste {date1} tha:datumJaar {bjaar}; ’.

’[tha:datumMaand {bmaand}]; ’.

’[tha:datumDag {bdag}]]; ’.

’[tha:periodeDatumLaatste {date2} tha:datumJaar {ejaar}; ’.

’[tha:datumMaand {emaand}]; ’.

’[tha:datumDag {edag}]] ’.

’using namespace ’.

’tha = <http://www.rhc-eindhoven.nl/tha#>’;

DoQuery($serql);

$results = QueryResults();

foreach ($results as $result)

{

foreach ($result as $key => $value)

{

$this->PeriodData[$key][] = $value;

}

}

}

public function GenerateFile($fname)

{

if ($fname != ’’)

{

$this->GeneratePeriods();

$this->file = fopen($fname, "wt");

$this->w(’<rdf:RDF’);

$this->w(’xmlns="http://www.rhc-eindhoven.nl/tha#"’);

$this->w(’xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"’);

$this->w(’xmlns:xsd="http://www.w3.org/2001/XMLSchema#"’);

$this->w(’xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"’);

$this->w(’xmlns:owl="http://www.w3.org/2002/07/owl#"’);

$this->w(’xmlns:time="http://www.w3.org/2006/time#"’);

$this->w(’xml:base="http://www.rhc-eindhoven.nl/tha">’);

$this->AddStandardNodes();

$this->PutPeriods();

$this->w(’</rdf:RDF>’);

Page 105: Technische Universiteit Eindhoven Technische Informatica

A.3. VIDEOLAN SERVER SCRIPT 99

fclose($this->file);

$this->file = false;

}

}

public function PrettyPrint()

{

echo print_r($this->PeriodData);

}

}

?>

A.3 VideoLan Server Script

Server Script

# VLC media player VLM command batch

# http://www.videolan.org/vlc/

new 61281 vod enabled

setup 61281 input

"F:\Movies\Stadsjournaal_Eindhoven\Stadsjournaal_Eindhoven_1957.mp4"

new 61301 vod enabled

setup 61301 input

"F:\Movies\Stadsjournaal_Eindhoven\Stadsjournaal_Eindhoven_1977.mp4"

new 61308 vod enabled

setup 61308 input

"F:\Movies\Stadsjournaal_Eindhoven\Stadsjournaal_Eindhoven_1984.mp4"

new 61318 vod enabled

setup 61318 input

"F:\Movies\Stadsjournaal_Eindhoven\Stadsjournaal_Eindhoven_1994.mp4"

new 61319 vod enabled

setup 61319 input

"F:\Movies\Stadsjournaal_Eindhoven\Stadsjournaal_Eindhoven_1995.mp4"

Batch file

"C:\Program Files\VideoLAN\VLC\vlc.exe" --ttl 12 -vvv --color

-I telnet --telnet-password RHCe --rtsp-host 0.0.0.0:5554

--vlm-conf RHCe.conf

Page 106: Technische Universiteit Eindhoven Technische Informatica
Page 107: Technische Universiteit Eindhoven Technische Informatica

List of Symbols

and Abbreviations

Abbreviation Description Definition

CATCH Continuous Access To Cultural Heritage page 2CHI Cultureel Historische Informatie page 2CMS Content Management System page 22GABOS Gemeentelijk Atlas Beschrijvings- en Ontsluit-

ingssysteempage 13

ISO International Organization for Standardization page 21RHCe Regionaal Historisch Centrum Eindhoven page 1SKOS Simple Knowledge Organization System page 51SMF Simple Machines Forum page 36TU/e Eindhoven University of Technology page 2VLC VideoLAN Client page 21VLM VideoLAN Manager page 21VoD Video on Demand page 12W3C World Wide Web Consortium page 46

101

Page 108: Technische Universiteit Eindhoven Technische Informatica

List of Figures

3.1 Data structure of Bookmark and Tijd objects . . . . . . . . . . . . . . 153.2 Data structure of Comment object . . . . . . . . . . . . . . . . . . . . 173.3 Standard CHI Explorer search and result interface . . . . . . . . . . . 233.4 CHI Explorer search cloud . . . . . . . . . . . . . . . . . . . . . . . . . 243.5 CHI Explorer time line view . . . . . . . . . . . . . . . . . . . . . . . . 253.6 CHI Explorer Map view . . . . . . . . . . . . . . . . . . . . . . . . . . 263.7 CHI Explorer Video view . . . . . . . . . . . . . . . . . . . . . . . . . 273.8 CHI Explorer Tag forms (Location, Period, Tag) . . . . . . . . . . . . 28

4.1 Board hierarchy in CHI Explorer forums . . . . . . . . . . . . . . . . . 364.2 Data structure of W3C Time ontology . . . . . . . . . . . . . . . . . . 464.3 Structure of Google Maps XML data file . . . . . . . . . . . . . . . . . 544.4 Time line without clustering . . . . . . . . . . . . . . . . . . . . . . . . 554.5 Time line with clustering . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1 RHCe home page in Joomla! . . . . . . . . . . . . . . . . . . . . . . . 705.2 CHI Explorer main interface . . . . . . . . . . . . . . . . . . . . . . . 715.3 result list for the query vehicle . . . . . . . . . . . . . . . . . . . . . . 725.4 result map for the query vehicle . . . . . . . . . . . . . . . . . . . . . 735.5 result timeline for the query vehicle . . . . . . . . . . . . . . . . . . . 745.6 Detail view of a video fragment . . . . . . . . . . . . . . . . . . . . . . 755.7 User has proposed the Eindhoven tag to the left, and the added Eind-

hoven tag to the right . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.8 Forum discussion thread . . . . . . . . . . . . . . . . . . . . . . . . . . 775.9 New locations in moderator interface . . . . . . . . . . . . . . . . . . . 785.10 Placement choice interface of a city within the location hierarchy . . . 795.11 Cities in moderator interface . . . . . . . . . . . . . . . . . . . . . . . 79

102

Page 109: Technische Universiteit Eindhoven Technische Informatica

List of Tables

3.1 Comparison of codecs . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

103