a hybrid cloud-p2p architecture for multimedia information retrieval on vod services

20
Computing DOI 10.1007/s00607-014-0428-3 A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services Vladimir Rocha · Fabio Kon · Raphael Cobe · Renata Wassermann Received: 31 October 2013 / Accepted: 9 September 2014 © Springer-Verlag Wien 2014 Abstract Recent research in Cloud Computing and Peer-to-Peer systems for Video- on-Demand (VoD) has focused on multimedia information retrieval, using cloud nodes as video streaming servers and peers as a way to distribute and share the video seg- ments. A key challenge faced by these systems is providing an efficient way to retrieve the information segments descriptor, composed of its metadata and video segments, distributed among the cloud nodes and the Peer-to-Peer (P2P) network. In this paper, we propose a novel Cloud Computing and P2P hybrid architecture for multimedia information retrieval on VoD services that supports random seeking while providing scalability and efficiency. The architecture comprises Cloud and P2P layers. The Cloud layer is responsible for video segment metadata retrieval, using ontologies to improve the relevance of the retrieved information, and for distributing the metadata structures among cloud nodes. The P2P layer is responsible for finding peers that have the phys- ical location of a segment. In this layer, we use trackers, which manage and collect the segments shared among other peers. We also use two Distributed Hash Tables, one to find these trackers and the other to store the information collected in case the tracker leaves the network and another peer needs to replace it. Unlike previous work, our architecture separates cloud nodes and peers responsibilities to manage the video metadata and its segments, respectively. Also, we show via simulations, the possibility V. Rocha (B ) · F. Kon · R. Cobe · R. Wassermann University of São Paulo, São Paulo, Brazil e-mail: [email protected] F. Kon e-mail: [email protected] R. Cobe e-mail: [email protected] R. Wassermann e-mail: [email protected] 123

Upload: renata

Post on 18-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

ComputingDOI 10.1007/s00607-014-0428-3

A hybrid cloud-P2P architecture for multimediainformation retrieval on VoD services

Vladimir Rocha · Fabio Kon · Raphael Cobe ·Renata Wassermann

Received: 31 October 2013 / Accepted: 9 September 2014© Springer-Verlag Wien 2014

Abstract Recent research in Cloud Computing and Peer-to-Peer systems for Video-on-Demand (VoD) has focused on multimedia information retrieval, using cloud nodesas video streaming servers and peers as a way to distribute and share the video seg-ments. A key challenge faced by these systems is providing an efficient way to retrievethe information segments descriptor, composed of its metadata and video segments,distributed among the cloud nodes and the Peer-to-Peer (P2P) network. In this paper,we propose a novel Cloud Computing and P2P hybrid architecture for multimediainformation retrieval on VoD services that supports random seeking while providingscalability and efficiency. The architecture comprises Cloud and P2P layers. The Cloudlayer is responsible for video segment metadata retrieval, using ontologies to improvethe relevance of the retrieved information, and for distributing the metadata structuresamong cloud nodes. The P2P layer is responsible for finding peers that have the phys-ical location of a segment. In this layer, we use trackers, which manage and collectthe segments shared among other peers. We also use two Distributed Hash Tables,one to find these trackers and the other to store the information collected in case thetracker leaves the network and another peer needs to replace it. Unlike previous work,our architecture separates cloud nodes and peers responsibilities to manage the videometadata and its segments, respectively. Also, we show via simulations, the possibility

V. Rocha (B) · F. Kon · R. Cobe · R. WassermannUniversity of São Paulo, São Paulo, Brazile-mail: [email protected]

F. Kone-mail: [email protected]

R. Cobee-mail: [email protected]

R. Wassermanne-mail: [email protected]

123

Page 2: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

V. Rocha et al.

of converting any peer to act as a tracker, while maintaining system scalability andperformance, avoiding using centralized and powerful servers.

Keywords Cloud computing · Peer-to-peer · Video-on-demand · DHT

Mathematics Subject Classification 68M14

1 Introduction

With the advent of multimedia devices, such as mobile phones and tablets, and theincrease of Internet access, Video-on-Demand (VoD) streaming services have becomevery popular since they allow users to play a video while it is being downloaded. Exam-ples of this kind of streaming services are Massive Open Online Courses (MOOC) sys-tems, reaching millions of students worldwide [24] and YouTube [48], a video-sharingsystem that has reached 1 billion users and more than 4 billion daily views [49].

One of the main characteristics of VoD streaming services is allowing users tojump to different positions in the video, whether to watch scenes containing moreuseful information, to skip boring parts or to reprise exciting ones [50]. Attractivescenes guides [17], personalized guides [46], and searching video metadata on tags,summaries, and transcription of the speech [28] help users to find this information.

Many educational VoD tools use the metadata search approach [11,39] as it allowsstudents to find more relevant video information. The usage of structures to storethe relation between videos metadata descriptions can improve even more the searchresults. One way to represent these metadata relationships is to use ontologies.

After obtaining the video information, it is necessary to discover its physical loca-tion. To do so, several strategies are currently used, including Content DistributionNetworks (CDN) [7], Cloud Computing [27], and Peer-to-Peer Networks (P2P) [16].

In a VoD P2P network, a video is divided into segments, which are distributedand shared across the network peers. Peers act as both client and server, downloadingvideo segments from other peers (client), and uploading them to other peers who arerequesting it (server). In this environment, one of the most important requirements isbeing efficient and scalable while discovering the requested video segments [32] tomaintain the user quality of service, reproducing the segment [15].

As the video segments are distributed in some peers of the P2P network, manyresearchers proposed finding them via structured and unstructured architectural mod-els. In the structured model, peers are interconnected to create distributed structuressuch as AVL Trees [23] and Distributed Hash Tables (DHT) [47], using efficient algo-rithms to find the segments. In the unstructured model, peers are able to connect to anyother peer in the network [20], possibly coordinated by centralized servers (or trackers).Trackers collect information from a set of peers—such as video metadata and the seg-ments which are downloaded and uploaded—and forward a segment request to thosepeers [34]. However, regardless of the model used, it is still a challenge to find, in an effi-cient and scalable way, peers that own and share the video metadata and its segments.

In this paper, we propose a novel Cloud Computing and P2P hybrid architecture formultimedia information retrieval on VoD services. In our architecture, the separation

123

Page 3: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

A hybrid cloud-P2P architecture

between cloud nodes and the scalable trackers of the P2P network is clear. The formeris responsible for the management of the videos metadata, while the latter for the man-agement of the video segments. The Cloud Computing layer allows for scalable videosegment metadata retrieval, using the ontology-aided information retrieval activity(OnAir) proposed by Paz-Trillo et al. [28] to improve the relevance of retrieved infor-mation. The cloud computing layer is also responsible for distributing structures andsearches over cloud nodes, thus, increasing the scalability of the retrieval activity. TheP2P network layer provides a scalable and efficient video segment discovery by usingtwo DHTs and a scalable tracker implementation. The first DHT allows discovery ofthe global trackers, responsible for answering segment requests and for monitoringa set of peers that are sharing segments of a certain video. The second DHT storesthe information collected from monitored peers in case the responsible tracker leavesthe network, and another one has to replace it. Finally, the architecture also providesinteraction between the two layers to retrieve the video metadata and its segments ina scalable way.

This paper is organized as follows. First, in Sect. 2, we review related work. Section 3introduces the Multimedia Information Retrieval problem. In Sect. 4, we show thearchitecture, composed by a Cloud and a P2P layers, which retrieves multimediainformation. In Sects. 5 and 6, we give a detailed description of the Cloud and P2Pimplementation. Next, our experimental results are presented in Sect. 7 and, finally,in Sect. 8, we conclude the paper.

2 Related work

In this section, we first review works which use a combination of Cloud Computingand P2P networks for information retrieval in video streaming services. Next, wereview works that use Distributed Hash Tables to discover peers who downloadedvideo segments distributed over the P2P network. We also discuss how our studydiffers from previous works.

2.1 Cloud–P2P for metadata and segment discovery

The use of Cloud Computing for video streaming has emerged in academic and incommercial products [1,44] as a promising alternative to previous solutions, such asCDN and P2P networks. Research has mainly focused on how to organize the nodesin the cloud infrastructure to deliver video segments to peers in the P2P network,maximizing the total upload bandwidth or minimizing infrastructure cost. Our workdiffers from them and focuses on how the cloud nodes should store the structures toretrieve video metadata and how to interact with the P2P network to discover peersthat physically store the segments described in the metadata.

Among the works that focus on the organization of the cloud nodes, Trajkovska etal. [36] propose an architecture that allows P2P users to use cloud nodes for streamingvideos, based on the quality of service they want to receive. A user can choose a videostream among different types of nodes. Each of them has a cost model associated withthe characteristics the node has at the moment, such as bandwidth, jitter and latency.

123

Page 4: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

V. Rocha et al.

To discover these nodes, the system provides Web services deployed on centralizedservers, with plans to distribute it across the cloud nodes.

On the aforementioned cost model, Cloudmedia [45] adds a model that representsuser behavior, estimating the demanded usage of the cloud. To fulfill this demandand maintain the same quality of service for all users, it implemented algorithmsthat automatically add and remove cloud nodes. This approach maximizes the totalbandwidth provided to users, minimizing the cost to be paid to the cloud infrastructureprovider. To discover the video information, the system provides trackers, that collectand maintain the available videos, the nodes and the segments they share.

Liu et al. propose Novasky [22] a cloud system focused on sharing video segmentsamong interconnected peers within a high-speed network and at high-resolution trans-mission rates of 1–2 Mbps. To achieve these rates, Novasky uses the most popular seg-ments replication and load balancing strategies based on Reed-Solomon coding [29],widely adopted in traditional mass storage systems (e.g., RAID 6). To discover thevideo and segments information, the system provides a Management Center, consist-ing of a set of servers which collect information, such as the video metadata and thepopularity of each video segment, from cloud nodes.

Clive [27] is a system architecture that organizes cloud nodes to provide a prede-fined and constant quality of service, maximizing the upload bandwidth of the overallsystem. For that, the system calculates an ideal number of cloud nodes called helperswhich support the distribution of multimedia segment across peers of the P2P net-work. Cloud nodes can be either passive (providing persistent segment storage) oractive (providing a temporary segment storage) as needed to maintain a predefinedquality of service. To discover the information shared among helpers and peers of theP2P network, the system offers Clive Manager, a central server which uses epidemicalgorithms to distribute information among nodes.

2.2 DHT for segment discovery

Recent research on P2P video streaming services has focused on the use of DistributedHash Tables (DHT) to efficiently discover information segments distributed among theP2P network peers. In DHT, which acts as regular hash table storing (key, value) pair,researchers try to solve two problems: (1) how to stabilize the structure in continuousreference updates, and (2) how to efficiently discover peer references that own aspecific video segment.

The first problem, structure stabilization, arises when a peer enters or leaves the net-work or when it downloads or deletes a segment. In such cases, the DHT must performa time-consuming update process to consolidate this information. VMesh [47] was thefirst alternative using the DHT as a peer repository. Each peer creates connections toother peers that own a predecessor and successor segment, reducing the stabilizationproblem since it is possible to discover the next segment using these connectionsinstead of asking the DHT. However, discovering a segment beyond the successorneeds a new search on the DHT, returning data inconsistencies if the search is per-formed before or during the update process. To overcome this problem, Bhattacharyaet al. [4] propose the Temporal-DHT structure, storing in the DHT two different

123

Page 5: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

A hybrid cloud-P2P architecture

references, static and dynamic. The static reference (segment downloaded by a peer)will not be removed from DHT until the peer leaves the network, then, there is noinconsistent data before or during the update process. A dynamic reference representsa position of the video watched by the user. Even with this position varying in time,Temporal-DHT executes the update process only in predefined time intervals, return-ing inconsistencies in the search process only in these intervals. Another alternativethat stores dynamic references is Time-Driven Mesh (TDM) [9]. The difference withTemporal-DHT is that TDM adds a peer geographic location in the dynamic reference,retrieving peers watching the same position and closer in terms of latency.

The main difference of our proposal is that our DHT stores peer references respon-sible only for managing a complete video, instead of storing any peer reference asother works do. Compared with previous alternatives, our proposal has only one rangeof dynamic references (the full video range). This approach decreases the referenceupdates on the DHT performed in the update process, reducing inconsistencies in thesearch process and maintaining the scalability of the system.

The second problem, segment discovery performance, occurs when the number ofpeers in the network increases, and the time to discover the segment and its successorsalso increases. BulletMedia [42] attempts to reduce the search time by relying onaltruism of peers, replicating consecutive segments among them, even if they do notuse them. Thus, a search for a segment s returns a reference to a node that stores s andsome successor segments, not needing a new search to find the successor of s. To avoidstoring unnecessary replicas, Noh et al. [26] propose the Pseudo-DHT, which providesalgorithms for DHT insertion and search, focusing on cases when collisions occur inthe stored (key, value) pair. Thus, if a segment identifier (key) is added to the DHTand it already has a reference to a node that owns the segment (value), the algorithmperforms modifications in the key until it finds one that does not have an associatedreference in the DHT. When a search is performed, the same key modifications shouldbe necessary to find the peer reference that has the segment. The authors demonstratethat these modifications improve the search performance compared to insertion andsearch algorithms used in the DHT without the collisions treatment.

In [18], the closest research to ours in terms of use of trackers, Jimenez et al.demonstrate that a small change in the DHT structure will obtain the references (i.e.,trackers) in less than a second, regardless of the number of peers in the network. In ourwork, the main difference relies on the behavior of the peers, that will be convertedto trackers, and the addition of a secondary DHT (called Support DHT) to backupthe information collected by the tracker, maintaining the system performance whenfinding a segment and its successors.

3 Multimedia information retrieval

Multimedia Information Retrieval (MIR) is defined as the problem of finding mul-timedia material from large collections of information sources (mostly known asdocuments). Also, finding the physical location of the segments that compose themultimedia document which satisfies an information need.

123

Page 6: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

V. Rocha et al.

Each multimedia document (or metadata composed of tags, summary, and segmentsidentifier) might contain thousands of words (mostly known as terms). To find thosedocuments, MIR systems use a query as input, returning answers to the provided query.To simplify the user’s task, some works use a natural language as input, because humanknowledge is expressed in this form and does not require extra learning effort fromthe end-user. However, this approach tends to bring irrelevant results, affecting systemrecall (the measure used to assess an IR system ability to retrieve all relevant documentsto a user query in its collection, tolerating false positives) and precision (the measureused to asses an IR ability to identify only relevant documents in its collection withoutfalse positives).

In order to deal with such problems, several works use the expansion query process,which adds terms with the same meaning as the ones present at the original query to acopy of the original query, improving the chances of retrieving a relevant answer. Toexpand a query, some approaches build it using a synonym dictionary, i.e., a thesaurus,a synonym and related terms dictionary, e.g., Wordnet [19] or a domain vocabularyformal specification using ontologies [2,12]. Voorhees [41] has studied the impacts ofusing query expansion approaches, such as ontologies, on the performance of infor-mation retrieval systems. The results show that these structures help improving thequality of the query answers.

In this section, we first show how to find a relevant multimedia document (metadata)using ontologies. Next, we show how to find the physical location of the segmentsdistributed on the P2P network.

3.1 Metadata video retrieval using ontologies

As presented before, ontologies represent a certain domain of knowledge by meansof concepts and properties, where concepts model classes of individuals, and proper-ties model the relationship between them. To automatically infer these relationshipsbetween concepts, some reasoning mechanisms have been created, using a certainlanguage to represent the ontology. In the last few years, description logics (DL) hasbeen used to formalize ontologies. More specifically, the Web Ontology Language(OWL1) is built on top of a DL fragment.

Relevant metadata is retrieved by the expansion query process. It defines howsemantically close two terms are (called similarity degree) in order to decide its inclu-sion during the expansion. Paz-Trillo et al. [28] extends the idea from [3,21], definingTermSim as the similarity between two terms which have a large number of ontol-ogy properties linking them. Their algorithm first calculates the similarity using thecommon part for both terms, i.e., their closest common super-concept. Then they cal-culate the number of properties that interconnect both concepts and the total numberof properties for both.

During this process, the term weighting activity is responsible for constructing aquery vector, which is a representation of the query. Each vector element is obtainedby counting the number of times each index term appears in the query. Afterwards, the

1 http://www.w3.org/TR/owl-features/.

123

Page 7: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

A hybrid cloud-P2P architecture

query goes through the recalculation of the terms weight. Those who have the largernumber of similar terms at the index gain more weight, since these terms are foundto be main topics of the query and match the main topics of the document collection,stored in the ontology.

After the query has been expanded, the system is now able to retrieve the results.It uses a vector space model [31], in which every document is seen as a vector andthe process to build such structures is similar to the process of building the queryvector. In other words, every document is seen as a vector and each of the vector’selements stores the number of times each term in the index occurs in the document.This retrieval activity calculates the document vectors that are closer to the queryvector, using the formula proposed at [31]. After retrieving documents closer to thequery, i.e., more similar to it, the system is able to find the most relevant videos andits segments.

3.2 Video segment retrieval in P2P networks

As mentioned, research on P2P networks use DHTs as a structured architectural modelto discover video segments distributed in the P2P network [30]. One of the main DHTcharacteristics is its scalable structure, distributed and replicated among the peers of aP2P network. In this structure, a search for a certain key is efficient and always returnsa value (if this value exists) using O(log n) messages to be located, in which n is thenumber of nodes of the DHT [13,33].

In a VoD service context, the DHT reduces the overhead applied on centralizedcollector nodes (super peers or trackers), storing the collected information in thestructure instead of using centralized nodes. The information stored in the DHTsdepends on the type of search provided by the structure: search for peers owning allsegments of a video, search for peers owning a specific segment of a video, or searchfor peers owning the raw segment data. To find peers that own all video segments,a search on the DHT is performed using the key “Video Identifier”, and the valueretrieved is a list of IP addresses having the requested resource [37]. To access thisresource (i.e., downloading it), a new message will be sent to any peer in this list.Likewise, finding peers who own a specific segment is equal to the previous one butchanging the key, which now should be the “Segment Identifier” [43]. In case of asearch for the raw segment data, the key is also the “Segment Identifier”, and the valueretrieved by the DHT are the raw data that compose the segment and not the reference(IP address) that owns it [10].

However, using a DHT as an information collector, applied to VoD services, createdtwo problems: (1) increasing the number of participants in the network also increasesthe time to find a segment, decreasing user’s quality of service; (2) with intermittentnode joins and leaves, the DHT needs constantly updating of its structure, returninginconsistencies (i.e., false positives or negatives) in search results.2

2 While the structure performs its stabilization process [33].

123

Page 8: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

V. Rocha et al.

4 Solution architecture

The proposed architecture, composed of a Cloud Computing and a P2P layers, allowsfor an efficient and scalable multimedia information retrieval, minimizing the time tofind information, maintaining system scalability and avoiding inconsistencies when asearch is performed. The cloud layer is responsible for finding multimedia metadataand redirecting segment requests to the P2P layer. The P2P layer is responsible forreceiving the cloud request and finding the peers that physically own the segmentrequested.

The cloud layer, shown in Fig. 1a, consists of interconnected nodes that share theircomputational resources to allow efficient video metadata storage and retrieval. Cloudnodes are defined as: Index Nodes, Repository Nodes, and Query Nodes. Index Nodesare responsible for the indexing process, which includes the index structures creation(i.e., vector indexes and ontology) and the term activity weighting. The index structure,created by the Index Nodes, is sent to the Repository Nodes, responsible for storingand replicating the index structures. Finally, the Query Nodes are responsible for themetadata search and retrieval process. It includes processing the user search phrase,connecting to the Repository Nodes to obtain a list of video metadata, close the phrase,and retrieving the most relevant.

The P2P layer, shown in Fig. 1b, is composed of two structures. The first structure,which we call InterCVI (Inter-Cluster Video Index), uses a DHT to store the list ofpeers owning the N segments of a video, called trackers (a video file, represented byan unique identifier VIDEO_ID, is divided into N segments, each one represented by anunique identifier SEGMENT_ID). The second structure, which we call IntraCSI (Intra-Cluster Segment Index), uses a DHT to store the list of peers sharing the segments ofa video.

In our architecture, the steps to finding multimedia information are shown in thealgorithm of Fig. 2. The algorithm starts receiving a phrase in natural language sentby the user (Line 1). Then, we send to the cloud a search message to find videos thatbest match the user search phrase (Line 2). Once the user chooses a video from thelist and the position he wants to watch (Lines 3 and 4), we send to the cloud a searchmessage to find the metadata of the selected video (Line 5). Thus, the video identifier

Index Nodes

Repository Nodes

Query Nodes

Tracker

Peer

Intra-Cluster1

Intra-Cluster2Intra-Cluster3

Intra-Cluster4

InterCluster

(a) (b)

Fig. 1 Multimedia information retrieval with cloud and P2P layers. a Cloud layer, b P2P layer

123

Page 9: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

A hybrid cloud-P2P architecture

Fig. 2 Metadata and videosegments search algorithm

1: SEARCH(Phrase phrase)2: videoList ← Cloud.obtainVideos(phrase)3: video ← User.chooseVideo(videoList)4: position ← User.choosePosition(video)5: metadada ← Cloud.obtainMetadata(video, position)6: VIDEO ID ← metadata.obtainVideoIdentifier7: segment ← metadata.obtainSegment8: trackers ← InterCV I.trackers(VIDEO ID)9: chosen ← trackers.obtainTracker

10: peers ← chosen.obtainPeers in IntraCSI(segment)11: returns metadata & peers

and segment information will be extracted to be searched in the P2P Layer (Lines6 and 7). Then, we retrieve from the P2P layer, specifically using the InterCVI, thelist of trackers that own the video (Line 8). Thereafter, a tracker from the list (Line9) will be asked to find the peers owning the searched segment. The tracker verifiesin its IntraCSI which peers have the segment requested and if they are available toupload (Line 10). Finally, the tracker returns the videos metadata and the list of peersavailable to provide the segment (Line 11).

5 Cloud layer implementation

According to the National Institute of Standards and Technology (NIST), Cloud Com-puting is a model for enabling ubiquitous, convenient, on-demand network access to ashared pool of configurable computing resources, such as networks, servers, storage,applications, and services [25]. In our model, the cloud layer is composed of inter-connected nodes that run in a cloud infrastructure. The cloud layer also shares theircomputational resources to allow efficient index, storage, and retrieval of metadatainformation. As shown in Fig. 1a, these nodes are: Index nodes, Repository nodes,and Query nodes.

The Index Nodes are responsible for creating index structures that will store thevideos metadata and its associated ontology. In the cloud layer, these nodes are notinterconnected since the process of building index structures does not require theinteraction with other nodes. However, we maintain connections to the Repositorynodes, where the created structures will be stored.

The steps for creating the index structures are shown in the algorithm of Fig. 3a,executed when a user wants to provide a video and the associated ontology (Line 1).First, we insert the video metadata in the cloud to create the index structure (Line 2),its ontology, and a dictionary composed of terms contained in the ontology (Lines3–5). Then, for each word contained in the metadata, the summary, the keywords, andthe video transcript (Line 6), we obtain a token using the stem of the word (Line 7),counting the occurrences of this token in the cloud (Line 8) and storing it, handlingif the token does not exist in the cloud (Lines 9 and 10). Finally, it is necessary torecalculate the weights of the inserted tokens present in the cloud (Line 11).

The Repository Nodes are responsible for storing the indexing structures createdby Index Nodes and for answering the requests of the Query Nodes searching videometadata. To keep the structures always available, avoiding a single point of failure, we

123

Page 10: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

V. Rocha et al.

1: CREATE INDEX(meta, onto)2: doc ← Cloud.addDoc(meta)3: if onto is not in Cloud4: Cloud.addOntology(onto)5: Cloud.addDictionary(onto.terms)6: for each word in meta.fullText7: tok ← stem(word)8: Cloud.occurrences(meta, tok, doc)9: if tok is not in Cloud.termEntries

10: Cloud.addTermEntry(tok)11: Cloud.computeWeightTerms

1: EXPAND QUERY(phrase)2: qryW ← 03: o ← Cloud.getOntology4: for each Term u in phrase5: qryW += u.weight6: for each Term t in Cloud.termEntries7: for each Term u in phrase8: sim(t) += qryW * TermsSim(t, u, o)9: wex(t).weight ← sim(t) / qryW

10: for each Term u in phrase11: if wex(t) is one of maxExp terms12: phrase += u13: qryW += query(u.weight)14: returns phrase

(a) (b)

Fig. 3 Cloud layer algorithms. a Create index algorithm, b expansion query algorithm

distribute the index structures across the cloud nodes, using the open source Cassandradistributed database,3 based on the BigTables concept [8].

The Query Nodes allow an user to send a phrase in natural language and returnsa video metadata list most relevant to the phrase (Line 2 of the algorithm in Fig. 2).The sequential activities of this process are:

1. Pre-Processor: This activity applies spelling corrections to the phrase using theJazzy API4 to access a general dictionary (in our experiments, we used a Brazil-ian Portuguese dictionary, br.ispell [6]) and a domain dictionary automaticallyextracted during the index creation process. In addition to these corrections, weeliminate stop words and apply a stemming process to each word of the phrase.Finally, weights are assigned to the terms based on the presence (or absence) inthe ontology and their frequency in the index.

2. Query Expansion: This activity adds some close terms of the ontology which arerelated to the phrase given by Pre-Processor. To do that, the phrase is split intowords. Then, for each word, we calculated its similarity with the terms stored inthe index and the ontology. The steps executed in this activity are shown in thealgorithm in Fig. 3b.In this algorithm, we first calculate the weight of the entire phrase, summing theweights of all its terms (Lines 4 and 5). For each term stored in the cloud (Line6), we calculate the degree of similarity with the terms contained in the phrase, asdescribed in Sect. 3.1 (Lines 7–9). Then, for each term contained in the phrase (Line10), we verify whether the term is one of the maxExp terms with greater similarity(Line 11). Adding it to the original phrase (Line 12) and recalculating the weightof the original phrase with the added term, thereby generating the weights of theexpanded query (Line 13). Finally, we return the expanded phrase containing theoriginal terms and the added terms (Line 14). A rough complexity analysis of thealgorithm suggests that, in case there are n terms in Cloud.termEntries from a

3 http://cassandra.apache.org.4 http://jazzy.sourceforge.net.

123

Page 11: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

A hybrid cloud-P2P architecture

certain domain and k terms, being k significantly smaller than n, the nested loops(Lines 6 and 7) run in O(k × n), which is the same as O(n). Inside the nestedloops, the TermsSim method is called (Line 8). As explained in Sect. 3.1, thismethod first calculates the similarity for two terms using the their common part.This operation traverses the ontology tree in O(log n) for each term. Then, theTermsSim calculates the number of common properties that connects both terms,traversing the ontology tree again in O(log n). Thus, the query algorithm runs inO(n log n).

3. Retrieval: This activity compares the phrase given by the Query Expansion withthe videos metadata stored in the cloud and returns those that best respond tothe query, using the vector space model implementation. The steps taken in thisactivity are: First, a video metadata list is initialized to an empty list. Second, foreach metadata inserted in the cloud, we calculate the degree of similarity betweenthe query terms and the metadata terms, using the TermSim algorithm describedin Sect. 3.1. If the similarity is greater than a value predetermined by the user,we add the metadata in the video metadata list. Finally, we return the list to theuser.

6 P2P layer implementation

The P2P layer is composed of interconnected peers who allow for scalable and efficientvideo segment discovery. Initially, we describe the Intra-Cluster Segment Index, whichis used to discover segments (Sect. 6.1). Then, we detail the Inter-Cluster Video Index,which is used to discover videos globally as well as their corresponding trackers(Sect. 6.2), ending with details of the layer construction steps (Sect. 6.3).

6.1 Intra-cluster segment index—IntraCSI

The IntraCSI is responsible for discovering peers that own segments of a video. Asshown in Fig. 1b, each IntraCSI is composed of three elements: the peers connection,formed by a set of peers that are sharing segments; the tracker, responsible for answer-ing segment requests and for collecting information from the peers; and a supportDHT, necessary to persist the collected information.

Peers Connection. The connections formed by peers who share video segments arealeatory, since a peer can potentially connect to any other peer in its IntraCSI. Each peeris autonomous in terms of managing connections with other peers and for choosingfrom where to download or upload a segment. It is responsible for informing theIntraCSI about the state of its resources, i.e., the segments that have been downloadedand the available upload bandwidth. To communicate its current state to the structure,the peer executes (at specific time intervals) the update information algorithm, shownin Fig. 4a. The first step of the algorithm is to obtain the downloaded segments (Line2). For each of these segments, the peer sends a finalized message to the tracker (Line4) and to the support DHT (Line 5). The second step is to obtain the currently usedupload bandwidth (Line 6). Depending on whether the bandwidth is below or above

123

Page 12: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

V. Rocha et al.

1: UPDATE STATE()2: segments ← getDownloadedSegments3: for each segment s in segments4: IntraCSI.updateTracker(s, peer)5: IntraCSI.updateDHT(s, peer)6: bandwith ← obtainUploadBandwith7: if bandwith <UPLOAD LIMIT8: available ← true9: else

10: available ← false11: IntraCSI.updateTracker(avail, peer)

1: SEARCH(Segment s, Peer p)2: member ← verifyMembership(p)3: accept new ← verifyIntraLimit4: if member or accept new5: peers ← obtainAvailablePeers(s)6: returns peers

(a) (b)

Fig. 4 Intra-cluster segment index algorithms. a Update peer state, b video segment retrieval

a certain value, predefined by the user, the peer sends an available or unavailablemessage to the tracker (Lines 7–11).

Tracker. The tracker is a peer who has downloaded all the segments of a video. Itis responsible for managing a video, i.e., for answering segment requests, and forcollecting the state of peers who are sharing segments. It is important to note thattrackers behave also as seeders,5 i.e. peers that owns all the segments. We use thosepeers by its stability, i.e., using the network for more time instead of just downloadinga segment and leaving. To avoid overloading the tracker with information receivedby peers, maintaining the system’s scalability, it is required to limit the number ofpeers in the IntraCSI. To do that, when the number reaches a certain value (calledINTRA_LIMIT), no additional peer is allowed to join the IntraCSI, as will be explainedin Sect. 6.3. This value depends on the amount of segment requests which the trackercan answer without affecting its performance.

The tracker answers a segment request using the algorithm shown in Fig. 4b. Oncethe request arrives, the tracker verifies if the requesting peer is already a member ofthe IntraCSI (Line 2), and the quantity of members of the structure (Line 3). If the peeris already a member, or if the IntraCSI supports a new peer, i.e., if it has not reachedINTRA_LIMIT6 (Line 4), the algorithm analyzes in the collected data, from monitoredpeers, if anyone has downloaded the segment, and if it is available to upload it (Line5). If there is at least one available peer (including the tracker itself), it returns a listof IP addresses of these peers (Line 6).

An important issue to mention is the peer-to-tracker conversion process that reducesthe overhead on existing trackers, distributes the segment requests, and maintainssystem scalability. This process occurs when a tracker leaves the network, or thepeer has downloaded all the video segments. When a tracker leaves the network, itis necessary to create a new tracker to manage the video, collect the peer states, andanswer segment requests. For this process to be performed as fast as possible, andwithout re-asking all participants their states, the new tracker loads, into its memory,the information collected by the old tracker that has just leaved, using the support

5 http://www.bittorrent.com/help/manual/glossary#seed.6 The request implies to download the segment, so, the requesting node will be part of the IntraCSI.

123

Page 13: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

A hybrid cloud-P2P architecture

DHT for retrieving such information. This information can be outdated, but it will beupdated again when the peers send their states to the new tracker, using the algorithmdepicted in Fig. 4a, which is executed every 30 s. When a peer has downloaded allsegments, the conversion process described in Sect. 6.2 takes place.

Support DHT. This structure acts as a distributed repository of peer resources (videosegments downloaded). The DHT stores the SEGMENT_ID as key and the value is a listof IP addresses for those peers that have downloaded this segment (information sentby the update algorithm in Fig. 4a). It is important to mention that our support DHT islimited by the number of peers monitored by the tracker, and it stores peer referenceswhich have a certain segment and not the raw data that compose it.

6.2 Inter-cluster video index—InterCVI

The InterCVI enables an efficient discovery of trackers who have downloaded allsegments of a video. The structure consists of a DHT which stores a VIDEO_ID askey, and a list of the tracker’s IP addresses (with the number of peers it monitors) asvalue.

To join the InterCVI, a peer must have downloaded all segments of a video andbe converted to a tracker. The conversion algorithm is shown in Fig. 5. The processbegins verifying if the peer has downloaded all segments of a video (Line 2). Then, itverifies if the number of peers in its IntraCSI has reached a INTRA_LIMIT value (Lines4–5). If the limit is reached, the peer sends a join message to the InterCVI—requestingto be added as a new tracker (Line 6), and sends an unsubscribe message to the trackerand to the Support DHT—requesting to be removed as a peer (Lines 7–8).

We should emphasize that the limit verification, before performing the conversion(Line 5), is important to maintain the structure’s good performance. When a conversionto tracker is performed, the InterCVI gains a member, but the IntraCSI loses a veryimportant source, because it has all the segments.

As the InterCVI and the Support DHT have their own consolidation cycles, dealingwith the data inconsistencies occurring between these cycles is an important point tobe considered in the system. To maintain the consistency, our architecture first mini-mizes the information shared by both DHTs, reducing it to just videos and segmentsidentifiers, which is a small subset of all the data stored. Second, the scalable trackerskeep this information in memory, avoiding the use of the Support DHT as much aspossible; it will only be used when the tracker leaves the network and another onecomes to replace it, as mentioned in Sect. 6.1.

Fig. 5 Peer-tracker conversionalgorithm

1: PEER TRACKER CONVERTION()2: downloaded ← verifyDownloadAllDegments(video)3: if downloaded == true4: current qty ← IntraCSI.obtainCurrentPeers5: if current qty ≥ IntraCSI.INTRA LIMIT6: InterCV I.addTracker(peer, video)7: tracker.removePeer(peer)8: SupportDHT .removePeer(peer)

123

Page 14: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

V. Rocha et al.

1: PEER AVAIL VIDEO(Video v)2: tracks ← InterCV I.trackers(v.VIDEO ID)3: if tracks is nul4: InterCV I.addTracker(peer, v)5: else6: chosen ← tracks.obtainTracker7: qty ← chosen.obtainCurrentPeers8: if qty <IntraCSI.INTRA LIMIT9: chosen.addPeer(peer)

10: else11: InterCV I.addTracker(peer, v)

1: PEER SEGMENT(Segment s)2: V ID ← s.getVideoIdentifier3: tracks ← InterCV I.trackers(V ID)4: chosen ← tracks.getAlmostLimit5: chosen.SEARCH(s, peer)6: chosen.add peer(peer)

(a) (b)

Fig. 6 P2P layer construction algorithms. a Adding a video, b downloading a segment

6.3 P2P layer construction

The layer construction can be performed in two ways: The system is built withpeers who will provide a video (with all its segments) to be downloaded by oth-ers; the system is built with peers who need to download one or more segments of avideo.

The algorithm in Fig. 6a shows the first case, adding a peer that will provide avideo in the P2P layer. The first step is to search, in the InterCVI, if there are trackersfor this video, using the video identifier as key (Line 2). If there are no trackers, thepeer joins the InterCVI, becoming a new tracker for the video (Line 4). If there aretrackers, the peer chooses one, and verifies if the number of members has reached theINTRA_LIMIT limit (Line 6–7). If the number is less than the limit, the peer is insertedas a new member of the IntraCSI (Line 9), or, if it is greater, the peer is added on theInterCVI, becoming a tracker for the video (Line 11).

The algorithm in Fig. 6b shows the second case, adding a peer that will downloada segment. The first step is to search, in the InterCVI, if there are trackers for thisvideo, using the video identifier as key (Line 2–3). The peer chooses the tracker withthe greatest number of members, but less than the INTRA_LIMIT limit (Line 4). Then,the peer obtains a list of peers from the tracker (Line 5), as shown in the algorithm inFig. 4b, and joins the IntraCSI as a new member (Line 6).

7 Experimental results

In this section, we evaluate the efficiency and scalability of the proposed architecturethrough simulations. The efficiency was measured in terms of seconds used by thelayers to retrieve an information, with a limit near of 1 s, as proposed by Jimenezet al. [18]. The scalability was analyzed in terms of how long it took to retrieve theinformation if compared with respect to the increment of the members in the layers.To carry out the simulations, we developed our strategies in Java, using Cassandra(cassandra.apache.org.) as the distributed storage of the Cloud layer, and TomP2P(tomp2p.net.) as the DHT implementation used by the P2P layer. For Cassandra and

123

Page 15: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

A hybrid cloud-P2P architecture

Table 1 Configurations for thetests

Configuration Engine Withkeyword Withtranscr

NoExp – Yes No

J-Kw Jena Yes No

O-Kw Our Yes No

J-Both Jena Yes Yes

O-Both Our Yes Yes

TomP2P, we used the configuration mentioned in [35] and [38], respectively. The mainsystem configurations were a Query node that performed a request to one tracker ata time, with no parallel searches; the INTRA_LIMIT was setted to 500 (10 times higherthan a peer managed its neighbours in the BitTorrent protocol), without influencingscalable tracker performance; and, finally, the peer informs the tracker its downloadedsegments and if is available to upload each 30 s.

We used PeerSim (peersim.sourceforge.net.) as the simulation framework to eval-uate the protocols developed for the Cloud and P2P layers. To simulate the nodesconnectivity in PeerSim, we use a network topology based on King [14]. This topol-ogy, widely used in scientific research, represents a realistic situation of Internet hostswith its bandwidth and latency constraints. In this topology, the average end-to-endlatency among all the nodes is approximately 200 ms, with a peak of 300 ms. Thepacket loss probability was setted to 50 %. The searches requested on the system, overTCP, are made according to a power-law distribution, where half of them are for thetop 5 % of the keys, constituting a realistic workload for a P2P applications [5]. Allnodes were set with 2 Mbps inbound and 1 Mbps outbound access link bandwidth.

7.1 Cloud layer evaluation

The cloud layer was simulated on the PeerSim engine. In each simulation, we hada network with 100 nodes: 20 Index Nodes, 20 Repository Nodes, and 60 QueryNodes. The video collection used consists of a set of short clips, associated withmanually assigned keywords and speech transcription. Tests were executed over fiveconfigurations, as shown in Table 1. The first represents the system with no queryexpansion, using just keywords. The second and third configurations represent thesystem with query expansion and keywords, using the Jena and our solution engines.Jena (jena.sourceforge.net.) is a Java API for manipulating OWL ontologies whichcan make use of several different inference engines. The fourth and fifth configura-tions represent the system with query expansion, keywords and transcriptions of themultimedia files, using Jena and our solution engines.

We measured the quality of the retrieved information using of the F-measure for-mula [40], which combines precision and recall. We used the F-measure in the video-on-demand domain because it calculates the balance between the precision and recallof the retrieved results. The advantage of this measure is that an improvement in itsvalue indicates the improvement on both, recall and precision. Such measure provesbetter suited when an user pose queries, in natural language, in a multimedia retrieval

123

Page 16: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

V. Rocha et al.

0.5

0.55

0.6

0.65

0.7

0.75

0.8

O−BothJ−BothO−KwJ−KwNoExp

F−

mea

sure

Configuration

0.2

0.3

0.4

0.5

0.6

0.7

0.8

100000 80000 60000 40000 20000 1000

time

(s)

terms

O−KwO−Both

0.02

0.03

0.04

0.05

0.06

0.07

100000 80000 60000 40000 20000 1000

time

(s)

terms

ThreeQuorum

(a) (b)

(c)

Fig. 7 Cloud layer evaluation. a F-measure for metadata retrieval, b metadata time retrieval, c metadatatime dissemination

context. In this experiment, the values described as optimal in [28] for maximumvideo metadata and minimum relevance acceptance level were used. Such values wereempirically observed as optimal in a series of tests. We executed 100 queries over thefive configurations with a fixed maximum of video metadata to retrieve and a fixedminimum relevance acceptance value of 7 and 0.1, respectively. Figure 7a shows theaverage and standard deviation of F-measure for each configuration. In concordancewith previous results [28], configurations using query expansion, keywords and tran-scriptions performed better on average, whereas the configuration without expansionshowed the lowest average.

We measured the time required for a Query Node to perform the query expansionand retrieval activity shown in Sect. 5 (returning a list of video metadata), verifyingif the time was influenced by the index size. Tests were executed over 100 queriesand using the third and fifth configuration previously mentioned. In Fig. 7b, we cansee that the average response time does not keep a linear relationship with the indexsize, indicating that the cloud layer remains scalable, even when the index size isincremented. This phenomenon can be observed when incrementing five times thesize of the index (20,000–100,000 terms), the average response time is only duplicated(0.3–0.6 s).

We measured the time required for an Index Node to insert (or remove) the metadatafor a single video, and for the Repository Nodes to disseminate this information to

123

Page 17: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

A hybrid cloud-P2P architecture

three and to half of the cloud nodes (called Three and Quorum Configuration onCassandra), verifying if the time was influenced by the index size. Figure 7c showsthat the average insertion and dissemination time is almost constant, even when theindex size is incremented. This behavior occurs due to distributed storage, used by thenodes, disseminating the information in an asynchronous and parallel way.

7.2 P2P layer evaluation

In P2P simulations, we used the PeerSim engine to compare our solution with VMeshand Time-Driven Mesh (TDM) systems, presented in the related work section. In eachsimulation, peers joined and left the network at arbitrary times, independent of eachother, following a Poisson distribution. As usual on the Internet, and as VMesh does,we modeled a peer exit ungracefully, i.e., without informing the DHT or neighborsof its departure. Finally, according to the statistics shown in [50], in each simulation,peers did seven random segment queries.

We measured the inconsistencies on a segment retrieval process. As mentioned inSect. 2.2, the continuous element updates in DHT (a peer that downloads or deletesa segment) can result in search inconsistencies during the stabilization process. Tounderstand this behavior, we divided the nodes in two groups: one that inserts keysand another that searches for those keys. The first group inserts the video and segmentidentifiers as keys, with its IP address as value. The other group searches for thosekeys and verifies if the peer (referenced in the value) still owns the segment, countingas inconsistency if it does not. In our P2P layer, the requested key was always thesame (the video identifier); for VMesh and TDM, the requested key was the identifierof one of the segments in which the video was divided. Figure 8a shows the numberof inconsistencies that VMesh, TDM, and our proposal obtained from DHT. We canobserve that, as the number of network nodes increases, our strategy (i.e., only insertnodes having all the segments of a video) returns fewer inconsistencies than VMeshand TDM, improving the quality of the search.

We measured the time consumed by the segment retrieval process. As mentioned inSect. 3.2, the search time for a segment is a critical point in VoD services. In VMesh,when the search returns a value, we have immediately a list of peers who have thesegment requested. The search in TDM is similar to VMesh, but the list has peers whoare watching a time interval. In our proposal, however, we first obtain the referenceof a tracker, and then we ask it for the peer list. In Fig. 8b, we observe, in a networkof 5,000 peers, segment retrieval in our structure takes 816 ms (143 ms slower than inVMesh) in which the difference is given by the extra message that should be sent to thetracker. Interestingly, the total time remains close to 1 s, agreeing with tests performedin [18]. Finally, it is important to realize when performing a new search for a segment,VMesh and TDM will require another 673 ms, differently from our results, whichconsume only 143 ms because the tracker is already known by the requesting peer.

We measured the load applied on the streaming servers, which is directly related tosystem scalability. The purpose of this simulation was to determine how overloaded ourstreaming servers will be, if compared to fully distributed systems, such as VMesh andTDM. We measured the load by the number of media streams the server had to provide

123

Page 18: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

V. Rocha et al.

0

10

20

30

40

50

60

70

50000 40000 30000 20000 10000 1000

inco

nsis

tenc

ies

peers

Cloud+P2PVMesh

TDM

0.6

0.8

1

1.2

1.4

1.6

50000 40000 30000 20000 10000 1000

time

(s)

peers

Cloud+P2PVMesh

TDM

0

2000

4000

6000

8000

10000

50000 40000 30000 20000 10000 1000serv

er s

tres

s (n

umbe

r of

str

eam

s)

peers

Cloud+P2PVMesh

TDM

(a) (b)

(c)

Fig. 8 P2P layer evaluation. a Inconsistencies on retrieval, b time for segment retrieval, c stress on streamingservers

when a segment was not found in the P2P network, using the same peer populationas VMesh and TDM. In our proposal, all searches of a segment are distributed to thetrackers stored in DHT under the “video identifier” key. However, in VMesh and TDM,searches for a segment are distributed on peers stored under “segment identifier” and“interval identifier” keys, respectively. Figure 8c shows that the load on the streamingservers is a little lower in VMesh and TDM, but our system has a similar curve, whichindicates that the proposed system is also scalable.

8 Conclusions

In this paper we have proposed a novel combination of Cloud Computing and a P2Pnetwork for multimedia information retrieval in Video-on-Demand services. Our archi-tecture uses a Cloud Computing layer to retrieve video metadata and a P2P layer todiscover the peers who own the video segments. The cloud layer uses ontologiesto improve the relevance of the retrieved information, distributing the indexing andsearching structures among its nodes. The P2P layer uses a scalable implementationof trackers, that monitor a set of peers sharing segments of a video, and two DHTs,one to find the trackers, and the other to store the states from monitored peers, in casethe tracker leaves the network and another has to replace it.

123

Page 19: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

A hybrid cloud-P2P architecture

According to the experimental results, we believe the proposed architecture could beused in systems in which the most important factors are the number of inconsistenciesand the response time to search a random segment. We observed that our architecturereduces the number of inconsistencies compared with known alternatives and alsoreduces the time to search a segment when the tracker is already known (i.e., a searchwas already performed). However, our solution applies an additional overhead onstreaming servers, causing scalability problems if the number of trackers for a videois not well-managed. Currently, we are studying alternatives to automatically managethe creation of trackers, focusing on system scalability. In addition, we are working onthe P2P layer structure to retrieve the most frequently downloaded segments, reducingthe number of messages exchanged in the DHT stabilization process.

Acknowledgments This research is part of FAPESP project OnAIR grant number 2010/19111-9.

References

1. Amazon Cloudfront (2013). http://aws.amazon.com/cloudfront2. Andreasen T, Nilsson J, Thomsen H (2000) Ontology-based querying. In: Proceedings of the interna-

tional conference on flexible query-answering systems, pp 15–263. Aslam J, Frost M (2003) An information-theoretic measure for document similarity. In: Proceedings

of the international ACM SIGIR conference, pp 449–4504. Bhattacharya A et al (2010) Temporal-DHT and its application in P2P-VoD systems. In: Proceedings

of the IEEE ISM, pp 81–885. Bianchi S et al (2006) Adaptive load balancing for DHT lookups. In: Proceedings of the international

conference on computer communications and networks, pp 411–4186. Br.ispell (2013). http://www.ime.usp.br/~ueda/br.ispell/7. Buyya R et al (eds) (2008) Content delivery networks. Springer Gmbh, New York8. Chang F et al (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput

Syst 26(2):4:1–4:26. doi:10.1145/1365815.13658169. Choi H et al (2011) TDM: time-driven mesh overlay network for peer-to-peer video-on-demand ser-

vices. In: Proceedings of CYBERC, pp 100–10610. da Silva AB (2011) THOR: a P2P distribution video system based on D1HT technique (in portuguese).

Master’s thesis, Universidade Federal do Rio de Janeiro11. EDX (2013). http://www.edx.org12. Guarino N, Masolo C, Vetere G (1999) Ontoseek: content-based access to the web. IEEE Intell Syst

14(3):70–8013. Gummadi K et al (2003) The impact of DHT routing geometry on resilience and proximity. In: Pro-

ceeding of the ACM SIGCOMM conference, pp 381–39414. Gummadi KP, Saroiu S, Gribble SD (2002) King: estimating latency between arbitrary internet end

hosts. In: Proceedings of the ACM SIGCOMM workshop on internet measurment, pp 5–1815. Hareesh KDM (2013) Quality of service in peer to peer video on demand system using V chaining

mechanism. J Comput Inf Technol 2(1)16. He Y, Guan L (2010) Peer-to-peer streaming systems. In: Intelligent multimedia communication, pp

195–21517. He Y, Shen G, Xiong Y, Guan L (2009) Optimal prefetching scheme in p2p vod applications with

guided seeks. IEEE Trans Multimedia 11(1):138–15118. Jimenez R, Osmani F, Knutsson B (2011) Sub-second lookups on a large-scale Kademlia-based overlay.

In: 11th IEEE conference on peer-to-peer computing19. Jing Y, Croft WB (1994) An association thesaurus for information retrieval. In: RIAO 94 conference

proceedings, pp 146–16020. Leuf B (2002) Peer to peer. Addison-Wesley, Reading21. Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the fifteenth inter-

national conference on machine learning, pp 296–304

123

Page 20: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services

V. Rocha et al.

22. Liu F et al (2011) Novasky: cinematic-quality vod in a p2p storage cloud. In: INFOCOM. IEEE, pp936–944

23. Liu J, Zhou M (2006) Tree-assisted gossiping for overlay video distribution. Multimedia Tools Appl29(3):211–232. doi:10.1007/s11042-006-0013-7

24. Matkin GW (2013) Open educational resources in the post mooc era. eLearn 2013(4)25. Mell P, Grance T (2011) The NIST definition of cloud computing. Tech. Rep. 800-145, National Institute

of Standards and Technology (NIST). http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

26. Noh J, Deshpande S (2008) Pseudo-DHT: distributed search algorithm for P2P video streaming. In:Proceedings of the IEEE ISM, pp 348–355. doi:10.1109/ISM.2008.57

27. Payberah A et al (2012) Clive: cloud-assisted p2p live streaming. In: IEEE 12th international conferenceon P2P, pp 79–90. doi:10.1109/P2P.2012.6335820

28. Paz-Trillo C, Braga P, Wassermann R (2005) An information retrieval application using ontologies. JBr Comput Soc 11(2):17–31

29. Plank JS (1997) A tutorial on reed-solomon coding for fault-tolerance in raid-like systems. Softw PractExper 27(9):995–1012

30. Rowstron A, Druschel P (2001) Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Proceedings of middleware, pp 329–350

31. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM18(11):613–620. doi:10.1145/361219.361220

32. Shen Z, Luo J, Zimmermann R, Vasilakos AV (2011) Peer-to-peer media streaming: insights and newdevelopments. Proc IEEE 99(12):2089–2109

33. Stoica I et al (2001) Chord: a scalable peer-to-peer lookup service for internet applications. SIGCOMMComput Commun Rev 31(4):149–160

34. Talaei S, Abhari A (2010) Adding multimedia streaming to BitTorrent. In: Proceedings of the 2010spring simulation multiconference, pp 235:1–235:6

35. TomP2P Configurations (2013). http://tomp2p.net/doc/advanced/36. Trajkovska I, Salvachua Rodriguez J, Mozo Velasco A (2010) A novel P2P and cloud computing hybrid

architecture for multimedia streaming with QoS cost functions. In: Proceedings of the internationalconference on multimedia, pp 1227–1230

37. Traversat B, Abdelaziz M, Pouyoul E (2003) Project JXTA: a loosely-consistent DHT rendezvouswalker. http://www.jxta.org/docs/jxta-dht.pdf

38. Tuning Cassandra (2013). http://www.datastax.com/docs/1.0/operations/tuning39. Udacity (2013). http://www.udacity.com40. van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworths41. Voorhees EM (1994) Query expansion using lexical-semantic relations. In: Proceedings of the 17th

annual international ACM SIGIR conference, pp 61–6942. Vratonjic N, et al (2007) Enabling DVD-like features in P2P video-on-demand systems. In: Proceedings

of the workshop on peer-to-peer streaming and IP-TV. ACM, pp 329–33443. Wolchok S, Halderman JA (2010) Crawling BitTorrent DHTs for fun and profit. In: Proceedings of the

4th USENIX conference on offensive technologies, pp 1–844. Wowza Media Server (2013). http://www.wowza.com/media-server45. Wu Y et al (2011) Cloudmedia: when cloud on demand meets video on demand. In: Proceedings of

the international conference on distributed computing systems, pp 268–27746. Xu T et al (2010) APEX: a personalization framework to improve quality of experience for DVD-like

functions in P2P VoD applications. In: IWQoS, pp 1–947. Yiu WP, Jin X, Chan SH (2007) VMesh: distributed segment storage for peer-to-peer interactive video

streaming. IEEE J Sel A Commun 25(9):1717–173148. Youtube (2013). http://www.youtube.com49. Youtube Statistics (2013). http://www.youtube.com/yt/press/statistics.html50. Zheng C, Shen G, Li S (2005) Distributed prefetching scheme for random seek support in p2p streaming

applications. In: Proceedings of ACM P2PMMS workshop, pp 29–38

123