website ﬁle download acceleration using webrtc · website ﬁle download acceleration using...

Website file download acceleration using WebRTC

João Luís Gonçalves Domingos

Thesis to obtain the Master of Science Degree in

Information Systems and Computer Engineering

Supervisor: Prof. Ricardo Jorge Feliciano Lopes Pereira

Examination Committee

Chairperson: Prof. José Manuel da Costa Alves MarquesSupervisor: Prof. Ricardo Jorge Feliciano Lopes Pereira

Member of the Committee: Prof. Rodrigo Seromenho Miragaia Rodrigues

October 2015

Acknowledgments

Many thanks to my supervisor, Prof. Ricardo Pereira, for great guidance, support and dedication.

To Brıgida, Sofia, my parents, brother and family, a big thanks for the motivation and support.

Thanks my friends Nuno Nobre, Pedro Pexirra, Ricardo Lourenco, Tiago Patrocınio and Joao Santos,

colleagues in the Instituto Superior Tecnico.

Thanks to Bruno Graca, Helena Arco, Ana Viegas and Rui Bandarra, for the friendship.

iii

Abstract

The Internet, at its birth, was always thought of as a platform for communication, sharing and collabora-

tion. Everything has evolved around those concepts.

As size and quantity of the data continues to increase, owners of website with popular content have to

respond to the high demand of their users. For this they expand their infrastructure, reserve cloud com-

puting resources and resort to Content delivery network (CDN) services. This is costly and sometimes

it is too much, during low traffic times, and it is not enough when in high demand and popularity.

In response to this challenge, this document presents a distributed, collaborative content distribution

solution using Web Real Time Communication (WebRTC), allowing the creation of scalable websites,

capable of hosting popular content without the need to invest in extensive hosting resources.

This allows for self scalability of a system and better file transfer performance with the selection of

the best and closest peers to share the content. Infrastructural costs for the owner of the website are

lower as fewer resources are required, as the clients share the work, easing the load on the server.

A prototype was built that recruits the users of a website and makes them share the content they al-

ready gathered from the server and distribute it through the other simultaneous visitors through Peer-to-

peer (P2P) connections. Using geographically distributed browser simulators, test cases were executed

to assess the capacity of the prototype and the potential of the solution bringing positive results were

some clients experienced a reduction of almost 80% in the download times.

Keywords: web, WebRTC, P2P, CDN, performance, resources

v

Resumo

A Internet foi, desde o comeco, sempre pensada como uma plataforma para comunicar, partilhar e

colaborar. Tudo evolui a volta destes conceitos.

A medida que o tamanho e quantidade do conteudo vai aumentando, os proprietarios de sites com

recursos que sao populares tem de ter resposta para a sua alta procura. Para isso tem de expandir as

suas infra-estruturas, alugar recursos na cloud e recorrer a servicos usando Content delivery networks

(CDNs). Isto e custoso e, por vezes, pode ser demasiado nas alturas menos exigentes e nao o suficiente

quando existem alturas com mais afluencia.

Para responder a estes problemas, este documento apresenta uma solucao colaborativa e dis-

tribuıda, usando Web Real Time Communication (WebRTC), permitindo a criacao de websites es-

calaveis, capazes servir de conteudo popular sem a necessidade de investir em grandes recursos de

alojamento.

Isto permite ter um sistema auto escalavel com melhores resultados ao nıvel da transferencia

de ficheiros com a seleccao de pares que sejam melhores e estejam mais perto para partilhar os

conteudos. Com isto diminui-se os custos de infraestrutura do proprietario do website e os clientes

partilham o trabalho, aliviando a carga no servidor web.

Um prototipo foi criado que recruta os utilizadores de um website para partilhar conteudo que eles ja

obtiveram do servidor e distribuı-lo pelos visitantes simultaneos atraves de ligacoes Peer-to-peer (P2P).

Usando simuladores de browser geograficamente distribuıdos, casos de teste foram executados para

avaliar a capacidade e potencial da solucao, fornecendo indicadores positivos onde alguns clientes

experienciaram reducoes do tempo de download ao nıvel dos 80%.

Palavras-chave: Web, WebRTC, P2P, CDN, performance, recursos

vii

Contents

Acknowledgments iii

Abstract v

Resumo vii

List of Figures xiii

List of Tables xv

Acronyms xvii

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Related Work 5

2.1 Web Applications requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Client-server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Client-server Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 P2P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1 P2P Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 CDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4.1 Architecture of a CDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4.2 Hybrid P2P CDNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 WebRTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

ix

2.5.1 What is WebRTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5.2 How WebRTC works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.6 Technology comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.7 Existing contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.7.1 Maygh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.7.2 PeerCDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.7.3 Swarmify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.7.4 Viblast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.7.5 PeerJS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.7.6 Peer5 / Sharefest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.8 Comparative study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Architecture 23

3.1 Non-functional requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Functional requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Architecture design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4.1 Server side components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4.2 Client side components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Implementation 34

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 Implementation Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3.1 Server side components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3.2 Client side components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 Evaluation 40

5.1 Tests Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.2 Deployment and hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.3 Tests Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.3.1 Page types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.3.2 Scenario 1 - Close peers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.3.3 Scenario 2 - Distant peers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.3.4 Scenario 3 - More than two peers . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4.1 Scenario 1 - Close peers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4.2 Scenario 2 - Distant peers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

x

5.4.3 Scenario 3 - More than two peers . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6 Conclusions 50

6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.2 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.3.1 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.3.2 Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Bibliography 56

xi

List of Figures

1.1 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Client-Server architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Client-Server architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Hybrid P2P System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Structured P2P System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5 CDN architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6 WebRTC architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.7 WebRTC DataChannel protocol stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 Proposed solution architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Solution components interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 P2P CDN participants interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4 Server side architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.5 Client side architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.6 WebRTC connection establishment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.1 Scenario 1 - Test 1 - Peer 1 - Download time comparison between WebRTC inactive and

active . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 Scenario 1 - Test 1 - Peer 2 - Download times between server GET and peer GET . . . . 44

5.3 Scenario 1 - Test 3 - Peer 1 - Download time comparison between WebRTC inactive and

active . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.4 Scenario 1 - Test 3 - Peer 2 - Download times comparison between server GET and peer

GET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

xiii

List of Tables

2.1 Comparison between solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 Comparison between WebRTC solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1 Coordinator - Example table that maps files to peers that have them . . . . . . . . . . . . 28

4.1 Added REST functionalities on the Coordinator . . . . . . . . . . . . . . . . . . . . . . . . 37

5.1 Scenario 1, Page 1 results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44





5.6 Scenario 3, Page 1 results part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.7 Scenario 3, Page 1 results part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

xv

List of Acronyms

API application programming interface

CAN Content Addressable Network

CDN Content delivery network

DB Database

DHT Distributed Hash Table

DNS Domain Name System

DOM Document Object Model

DTLS Datagram Transport Layer Security

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

ICE Interactive Connectivity Establishment

IEEE Institute of Electrical and Electronics Engineers

IEFT Internet Engineering Task Force

IP Internet Protocol

ISP Internet Service Provider

kb kilobit

kB kilobyte

Mbps megabits per second

mb megabit

MB megabyte

NAT Network Address Translation

OS Operating System

P2P Peer-to-peer

QoE Quality of Experience

QoS Quality of Service

xvii

REST Representational state transfer

RTT round-trip time

SCTP Stream Control Transmission Protocol

SDP Session Description Protocol

STUN Session Traversal Utilities for NAT

TURN Traversal Using Relays around NAT

TCP Transmission Control Protocol

UDP User Datagram

URL Uniform Resource Locator

VoD Video on Demand

W3C World Wide Web Consortium

WebRTC Web Real Time Communication

WWW World Wide Web

xviii

Chapter 1

Introduction

1.1 Background

The Internet, at its birth, was always thought of as a platform for communication, sharing and collabora-

tion. Everything has evolved around those concepts. The creation of documents that could be accessed

through the network, exchange of content through email messages and file sharing were developments

made in order to make information available to the general public or to specific individuals that are just

not in the same physical space.

When Tim Berners-Lee presented [1] the proposal for creating a ”small” Web at CERN, it was the

beginning of something that has changed human life forever. The World Wide Web (WWW) has evolved

so much since the start, and is still evolving, that nowadays it is not possible to imagine a world without

the Web. Our planet is now dependent on the exchange of enormous amounts of content, which hap-

pens everyday, every hour, every second. Some of this content is exchanged automatically and some of

it is by users that intentionally want to share and transfer it. The WWW is humanity’s distributed library,

being the first place many people go to get knowledge and answers, as its servers have loads of stored

information.

The size and quantity of these data will increase [2], keeping up with human knowledge growth,

improved bandwidth and network capabilities. The creation of new content is now performed at an enor-

mous rate by millions of users placing videos and photos in social media, for example. The evolution

of the hardware available also allowed for an dramatic increase of the data quality each individual can

produce, such as high definition videos and pictures. Different ways to consume the content also con-

tribute for the increase of the exchange of information, including streaming of movies and TV shows in

websites like NetFlix1 or Youtube2.

There are several methods to serve content online but the simplest and most used one is through a

web server. Logically structured text documents are made available that include other documents, like

images, videos and scripts, to enrich the text that they provide. This is the preferred way to present

formatted, user oriented data, which usually serves the purpose of sharing or announcing something to

1https://www.netflix.com2http://www.youtube.com

1

somebody. In this type of content presentation, the owner is responsible for creating and, sometimes,

serving the content and maintaining the infrastructure behind it.

Being the creator of the content also the only side that distributes it, it is obvious that it has to adapt

its response capacity according to the popularity of what it is serving. The scale of the infrastructure

required for serving the content of a website is dependent on the demand that content generates. Some

providers, like Amazon, adapt the infrastructure automatically according to the number of requests, but

in many other cases the owners are responsible for handling the popularity of their information. This

demands some heavy or elastic infrastructure to respond to user requests and it is even more so when

serving large content. Also, the infrastructure needs some capacity margin to ”survive” unpredictable

events like network anomalies [3] or an eventual slashdot effect3. Video on Demand (VoD) websites,

for instance, attract a lot of users a day that intend to watch movies, which are large sized content. To

respond to this demand, the infrastructure of the website has to be massive, so it can fulfill every request

at any given time from anywhere on the globe.

Another way of sharing data through the Internet is using Peer-to-peer (P2P) networks, which work

in a different way compared to a website. As the typical web browser content is served in a client-server

architecture, P2P, as the name hints, is a distributed network that involves direct connection between

one or more peers in order to exchange information. The more peers there are downloading the content,

the more upload capacity is available, as each peer acts as both consumer and provider of the content.

Even though this is an efficient way to collaborate and exchange information, it is only about exchanging

files and not about the correspondence of structured and logical content like a website is. It is in fact

very useful for sharing single or collections of files that are usable on their own.

Delivering data to thousands of globally distributed users, millions in the case of popular websites, is

a tricky task and surely no job for a single web server. To solve that problem, a lot of websites adopted

the use of Content delivery networks (CDNs). A CDN is an efficient and less costly way of distributing

content to users. CDNs have replica servers that cache the served content from the origin and serve it

themselves [4]. There replicas are scattered over the Internet and respond to requests that originate in

locations closest to each of them. One of the goals is to achieve smaller round-trip time (RTT), that will

reduce the latency when, for instance, loading a web page. CDNs also help ease the load on the origin

server, making static files that are requested frequently available on those endpoints. Because the files

are replicated through a collection of servers geographically dispersed [5], that answer to the nearby user

requests with some load balancing mechanisms. With this, the resource usage of the original server is

avoided and so is the bandwidth usage which will undoubtedly lead to some cost drop. This approach

is preferred for popular websites. Instead of having a big infrastructure, that needs to be solid enough

to withstand a flash crowd, websites can use the shared servers of CDNs that provide advantages from

a performance and reliability standpoint, at a fraction of the cost of deploying their own infrastructure.

There is also some testing on hybrid solutions [6, 7], using client-server and P2P architecture in order

to provide the best service to the end-user. But CDNs do not fully solve the problem, since costs still

increase with user traffic and their scalability is limited to its capacity.

3http://en.wikipedia.org/wiki/Slashdot effect

2

This is where a relatively new technology comes into play.

Web Real Time Communication (WebRTC)[8] is being adopted and supported by all the major web

browsers4 that support HTML5, and allows for real time device-to-device, mobile or not, communication

without the need of plug-ins, just by using the browser itself. What if a user, when accessing a website,

securely shared the contents retrieved with a nearby user also needing those same images, videos and

files? This would be a self-scalable solution because the more users retrieve the content, the more

servers a website has. The bandwidth costs would decrease because the web server would only have

to deliver the content when no other user is connected.

The challenges for this approach is to make it accessible for everybody without any effort for the

user, a fall-back mechanism for older browsers and security and consistency for the users as they have

to download the exact same file the website is providing.

Another challenge is trying to keep an acceptable and constant response time. Servers usually have

more bandwidth than website users and to serve part of the content through a user might result in a

poor experience, since the capacity of that personal computer will not be as high as that of a server.

There are also challenges in the development side as the solution must be simple enough for a

developer to implement on a new or existing website and take advantage without significant effort.

Overcoming these challenges and evaluating the actual impact of these changes are the objectives

of this work.

1.2 Proposed Solution

The purpose of this project is to merge some advantages of the P2P networks with the WWW, trans-

forming users that were previously only consumers into content providers, helping to spread the contents

of the website that they’ve already accessed, while they are accessing it.

A proposed solution (Figure 1.1) for this is to create a coordinator of peers that will be placed side by

side or integrated with the traditional Hypertext Transfer Protocol (HTTP) server. This coordinator has

the information of what are the clients connected to the website and who has which file.

The clients of that website will download an additional module file that allows them to connect to the

coordinator of peers and eventually to the peers that are visiting that same page. When a peer gets a

file of that page, e.g. an image, it will announce to the coordinator that it is now a server of that same file,

so other peers can find it and request that same file. Then, based on some logical decisions, the peers

will decide if they will download the files included in the HyperText Markup Language (HTML) document

from the server or if they will get them from another visitor of that page.

1.3 Thesis Contribution

The contributions of this work are:

4http://www.webrtc.org/

3

Figure 1.1: Proposed solution

• Proposal of an architecture and solution of a P2P CDN that allows peers to server static contents of

a website to other peers, while browsing the web pages. This is intended to adapt to the traditional

web servers and work under the hood to help reduce server costs and improve download times.

• Implementation of a prototype to demonstrate the validity of the proposed solution.

• Experimental evaluation of the prototype implemented.

1.4 Outline

This remainder of this document describes the research and work developed and it is organized as

follows:

• Chapter 2 describes the previous work in the field.

• Chapter 3 describes the system requirements and the architecture of the P2P CDN.

• Chapter 4 describes the implementation of the P2P CDN and the technologies chosen.

• Chapter 5 describes the evaluation tests performed and the corresponding results.

• Chapter 6 summarizes the work developed and future work plans.

4

Chapter 2

Related Work

This chapter will present some of the solutions that are currently used to share content on the Internet,

more specifically, through websites and exchange of popular files accessed by thousands of users.

Some of the aspects analyzed include how this content is shared, what are the technologies used

and what are the most used and adequate solutions for specific purposes. Also, how to surpass the

limitations on the systems and architectures that serve this content and at what price. After that, an

overview of WebRTC will be provided: where it comes from, how it works and what is its relevance to

this work.

2.1 Web Applications requirements

There are several criteria to consider the quality of a web application or website. According to [9] and

[10], and filtering the ones that are relevant for this work, these are the quality requirements for a good

web application:

Reliability: Applications that have critical functionalities simply cannot fail. The reliability of a web

application is the capacity that it has to deliver the data to consumers in a correct form and in a

timely fashion.

Scalability: This is the capacity for systems to respond and grow when it is needed to a greater number

of requests or work. The ability to cope with that growth, either sustained or spiked, is crucial in

modern systems. When a website cannot scale it can influence reliability, affect the user experi-

ence and expose security issues.

Availability: A web page must be constantly available, by everybody, everywhere. When a business

or service depends on it and that system is not available, losses may incur and clients may be

lost. The Internet is always open and so must be a web application, so no downtime or delays are

recommended.

Maintainability: Another important aspect for systems is maintainability. Websites nowadays are con-

stantly changing, morphing and need regular updates to their content. A one server web appli-

5

cation is easier to maintain than a multiple server one. If more servers need to be changed or

updated, more time is consumed in these tasks and thus more costs incur.

Quality of Service (QoS): This is a measure of the delays, errors and speed of transmission of data on

web systems. It is also related with reliability and availability. If a user is accessing a website and it

takes a long time to load or tries to download a file and it has slow transfer rate, this is considered

low QoS. A cause of delays or slow speed may be that the usage of the system is too high and

that it did not scale to the requirements.

2.2 Client-server

2.2.1 Client-server Architecture

Figure 2.1: Client-Server architecture

The traditional and classic website is the banner-flag of the client-server architecture [11]. Actually,

for modern web applications, this architecture is only used in small systems. Just like shown in Figure

2.1, this system architecture has as base a server that is all knowing of all the users and that is accessed

by them in order to get content. This server provides some kind of logic content and shares its resources

with the clients. In turn, clients only operate on their own interests as they do not share any resources

with the server or with the other users. Supporting services like email or websites, this type of model is

how the WWW was built and many Internet services still use it.

A centralized approach like this is usually a cheap and accessible alternative for companies or indi-

viduals that want to provide content to a limited number of users and in a timely fashion, since it is easily

maintained as everything is in a single place. For larger crowds, this can be a serious bottleneck. Since

the processing power and bandwidth consumption is all in the host (except for the client side scripts),

this means that every single file has to be provided by the server and all the logic processing also has

to be done by the central node. If the server fails, the service will be unavailable.

There are several techniques and solutions to mitigate the problems described in the previous para-

6

Figure 2.2: Multiple server architecture

graph. The establishment of multiple servers, Figure 2.2, is one of those alternatives. By having more

servers providing the same services, it can help to ease the load on each machine and split the pro-

cessing tasks between all the instances. This brings performance and availability benefits to the original

single server model as well as a backup to failures that can affect one or more servers, thus avoiding

downtime. The use of multiple instances may solve, with some cost involved, these issues, but it also

comes with its own issues. These replica servers need to be synchronized and coordinated in order for

all of them to provide the same version of the data. If a website full of static files is pretty straight forward

to replicate. The problem with this approach is that it is going to linearly increase the infrastructure cost.

For instance, if the server base is increased to support the maximum number of users it is probable that

it can be excessive for the other phases of the day.

Another way to address the scalability issue of the client server model is the implementation of a

cache system, where the most recent objects accessed are obtainable from a caching system. There

are different types of caches. A local cache is of the responsibility of the clients browser and usually has

small limit. This type of cache is machine related, that means that is only for that browser, that client. It

is valid for the whole browsing history of that browser, so if the cache size is, for example, 50 mb, then it

is the space it has to use to cache all the resources for all the websites visited.

Another type of cache is the forward proxy cache that allows the caching of resources for a wider

audience, since it is not machine restricted, as well as a much larger cache. As the request from the

client comes, the forward proxy checks if already has the resource from previous requests. If it has,

immediately replies to the client. If it does not have the resource, it acts in behalf of the client and

requests it to the origin server, gets the response, replies to the client and caches the response to use

in future requests.

The other common cache type is the reverse proxy cache. In this type of cache the proxy acts in

behalf of the origin server, pre-caching files that are most likely to be requested as well as dynamic

content.

These mechanisms can help decrease the load on the main servers. It also requires some solid

coordination with the original web servers but is a widely used solution to improve performance on

websites.

7

2.2.2 Evaluation

The client-server architecture is the architecture used in the WWW. It is simple, cheap and convenient

solution for small projects, companies and individuals that have a limited number of users, like a local

pet clinic or a local restaurant.

The problems arise when it has to scale to serve a larger audience [12]. It becomes more expensive,

harder to maintain, to make it reliable and constantly available, even with a large affluence, as it is

required for almost every business.

The WWW is currently working using HTTP/1.1. This is a stateless request-response protocol that

responds to several methods, which the most common are GET, POST and HEAD [13]. A client sends

a request to the server using a method, typically a GET or a POST, containing a header and a body, the

server responds with a status of success or error on the header and the response body which contains,

in case of success, the content requested. The HTTP protocol is stateless, meaning that the transaction

between the client and the server ends when the response is sent to the requester. This feature has to

be implemented by the application in order to maintain some state about what the client is doing on the

website.

HTTP is implemented on top of Transmission Control Protocol (TCP) as a transport layer that pro-

vides a reliable process-to-process communication service [14] and its requests are idempotents. It

supports caching mechanisms using fields on the header of the request and response and its goal is to

avoid needing to send full responses and in many cases the need of even making a request.

The cache in HTTP helps the performance when loading web pages and resources but its simplicity,

although excellent for serving simple things, require action from applications to implement extra concepts

such as a session.

2.3 P2P

2.3.1 P2P Architecture

Instead of giving all the responsibilities to the central entity, what if the users could contribute to the

distribution of the content while obtaining the data they require?

A P2P is a system composed of peers, autonomous entities, that organize themselves in a network

for sharing the usage of resources like the computing power or the network bandwidth used [15]. By

sharing all of these resources and contributing to the network in place, if one of the peers leave, the

other peers maintain the network alive. This would be different if they relied on a central entity in a

client-server architecture. Even though it was initially used to share files, P2P architecture was reused

in many other systems throughout the Internet.

It is important here to make a distinction between static and dynamic content. Static version are

files that are published or stored somewhere that can be accessed by P2P applications or by using the

simplest website methods. Every time a change is made that whole file has to republished so that the

clients can obtain the most updated version. Static content does not adapt to a user or time of the

8

day, instead it is accessed through direct locations on the network, it is a perfect candidate for caching

mechanisms and, since no transformation is required, multiple users access the same data allowing

them to share it between them P2P systems, like the popular BitTorrent [16], are used to share these

static files by dissipating the distribution costs of the content among the participants.

Dynamic content is information that is generated at the time of the request from the user. The

data created only exists in that moment, for that user. Dynamic content has its most natural habitat in

web applications that generate it depending on what is the user profile, session and permissions, what

is present on the Database (DB) at that time. This is sometimes resource intensive and definitively

requires some intelligence by the application to determine what to present and to who.

Executing this in a P2P application is a bit of a challenge and sometimes it does not make sense.

Would a user want have their personalized dynamic data distributed among other users as he is the one

that content is built for?

On the other hand, static file sharing through large populations of users on the WWW is very difficult

because it was built by design in a client-server architecture.

The client-server approach is very simple to deploy and maintain and can meet some requirements

of the list in 2.1. In large scale websites where multiple accesses need to be responded every minute,

more servers are needed so that there is no delay on the answers of the users’ requests. Even though

client-server systems are, in some cases, easy to scale just by adding more servers, sometimes some

degree of synchronous behavior is required.

P2P networks own foundation is to allow for a more ties-free connection between the users and that

could respond to demand even with high amounts of requests and usage.

Initial systems using this architecture were hybrid models[10] where a central entity was needed

and maintained all the peers connected and orchestrated everything that was happening in the network

(Figure 2.3). It would also allow for the lookup of peers within the network and the only thing was that the

actual content did not pass through this central server, which made everything more scalable [17]. The

problem was in fact reliability. If something happened to the server, all the peers would lose connection

to the network and it would dissolve.

Examples of this type of network, and with a lot of history, are Gnutella [18], Napster [19], eMule [20]

or BitTorrent [16].

The evolution of the P2P systems and protocols came to solve the problems that existed in the first

generation of P2P networks and aimed to provide a decentralized system that did not rely on a central

entity and still allowed for content to be shared worldwide.

Structured P2P networks are this evolution and they make use of Distributed Hash Tables (DHTs).

These new systems made it possible to create a network of peers without the need of a central server

keeping them connected (Figure 2.4). DHTs provide distributed indexing to the system and any peer can

obtain the location of the piece of content it wants, just by querying the table. Searching the content with

these tables is quicker and there are performance increases in relation with the regular linear complexity

approach of the first generation models. It functions similarly to a hash table but in a distributed manner

and according to some protocol implementation such as Chord [21], Pastry [22], Content Addressable

9

Figure 2.3: Hybrid P2P system

Network (CAN)[23] and Kademlia [24].

Figure 2.4: Structured P2P system

2.3.2 Example

BitTorrent

Protocols like BitTorrent are very popular for sharing files quickly between users. This is a little descrip-

tion of how it works.

When a file is uploaded, it first is divided in small parts that are then easy and quick to download. A

file is created, a torrent, that contains all the meta-data like name, file length, file integrity hashes (for

each part) and a tracker list. The user must download and install a client program in order to understand

those metadata files and connect to the peers on the network. Trackers are coordinators of the system

that know some of the peers that have file parts and where they are located in the network. They

provide each peer with information about some other peers, so that the swarming can begin. Direct peer

10

exchange between known peers allow for the discovery of more sources of the file. Using a DHT, there

is no need for a central server to tell the peer where the content is as it can figure it out by itself using

the table. This brings scalability, reliability and availability improvements to the system [25, 26].

After a peer has downloaded a part of a file it becomes immediately a source of upload of the part

itself. This creates a so called swarm of users, sharing parts of files among each other and using hashes

to index and map the pieces required to form the full file [27]. These hashes also allow verification of the

integrity of the part after it has been downloaded.

After the completion of the file the user has the content and now becomes a seed, that is a source

for that content, and will upload parts of it for other users that request it.

2.3.3 Evaluation

P2P networks are very resilient to, for instance, flash crowds[28]. When a large number of users, all of

a sudden, want some file, they request it to the seeders of the file. Initially some will get parts of the file

and become themselves servers of the content. So the number of peers that request the content is the

potential number of servers of that same content. This is a self scalable solution [26, 29] and easy to

maintain as each peer is responsible of their own processing, storage and bandwidth.

On the other side, one of the disadvantages of a P2P network is the inability to guarantee that some

content is always available. If nobody that has the full file is online, there is a chance that it is impossible

to get it hole if some parts are missing among the peers of the network. Another disadvantage is

the above average complexity for the regular user to obtain content with this method. Usually a client

standalone software is required and some knowledge to operate it.

It is a good way to share files but, at least for solutions like BitTorrent, it is not appropriate for the

WWW as it doesn’t have the logic that browsers provide and neither was it designed for that.

2.4 CDN

It is legitimate to conclude that for sharing files, static content that needs no context but itself, P2P

networks are really a good solution for people to obtain content that others share. For websites the story

is a bit different. Web pages represent the front face for many companies and individuals around the

world and usually serve ever changing content. Sites are served centrally by servers that contain all the

data, that a web browser can read, and users access it through a direct link, search results or references

on other websites.

If for small websites the things are simple to maintain and relatively cheap, for bigger websites things

are a bit more complicated. Large sites have to respond to a lot more users requests and this implies

a larger infrastructure. Serving content through the web has to have in account the location of the

consumer, the load of the server and it’s hardware specifications, available bandwidth and storage. All in

all, everyone that exposes a website has to be concerned about the QoS it provides to their customers.

An unavailable or delayed website might be the difference between making a sale or users choosing one

11

service, more available, over another. The capacity of being always online, reliable, fast and secure is a

big advantage but it is also a costly option in the traditional website architecture.

To fulfill all of the requirements described in 2.1, just by using the traditional client-server architecture,

the owner of a big website needs to have several servers throughout the globe, with around the clock

maintenance, with high-end hardware specifications and top level bandwidth that can respond to the

worst case scenario, a flash crowd. This is very costly1 and, in most of the time, excessive.

As user behavior is largely unpredictable and what has low traffic today can be popular tomorrow,

the best bet is, instead of spending a lot of resources in preparation for a worst case scenario, prepare

the system to be scalable. This is where CDNs can help. Traditional CDNs replicate content from the

origin server and make it available throughout replica servers spread in pivotal locations to allow for a

response from the closest replica server to the point of the request. A CDN does not serve from only

one content provider. The resources of the network are usually shared between the subscribers of the

service and so are the costs, so to speak.

The service that these networks offer to the website owners is the scaling of their systems [30] by

taking care of all the static content for them. Besides this scaling mechanism, that CDN can do easily

considering that is why they are built for, it offers for a relative small price, caching of resources, the

dissemination of the content on several hot spot locations for replicas and response times that regular

web servers cannot really match.

2.4.1 Architecture of a CDN

A generic model of a CDN is displayed in Figure 2.5, based on [31, 32, 4], with some description of the

steps required for providing the service to users and website owners:

a) When the content of the Web Server is deployed, it sends information about the data to be dis-

tributed and assigns some URI space.

b) The Web Server then publishes it’s content to the Distribution System.

c) The Distribution System replicates the content and places it on the Replica Servers that are

going to serve the data.

d) The Distribution System interacts with the Request Routing System in order to assist on the

selection of the Replica Server that the user will be using and requesting the data from.

e) The user requests some documents from the Request Routing System and receives a redirect

to the most adequate replica server for his request.

f ) The user requests and receives from the Replica Server the content he requested, thinking he is

getting it from the origin.

g) Usage is logged for all the replicas and is forwarded to the Billing System.

1http://www.forbes.com/sites/quora/2012/10/29/how-much-did-it-cost-youtube-to-stream-gangnam-style/

12

Figure 2.5: CDN architecture

2.4.2 Hybrid P2P CDNs

An evolution from traditional CDNs are P2P CDNs. This model is a mixture of the CDN architecture

and the P2P architecture [33], where the user that requests the data is also a distributor of it. This is

accomplished by installing some software to enable the P2P content exchange. Usually the incentive to

install this program is the claim that it is going to speed up the users downloads.

Even though this needs a software installation, this type of system has many advantages, being the

strongest of them all, scalability. By using this model, the whole system benefits by relieving the original

content server and the replica servers from the distribution of the content and by that, lower the running

costs of the content provider and the distribution system. So if there are no users that already have

the data, when a new user requests the content, it gets it from one of the replica servers. After getting

the content, the user is itself a replica server of that content and it’s mapped on the Request Routing

System. When yet another user arrives to the website, for instance, the application will inquire the

Routing System and one of the potential providers of the content requested may be the previous user or

one of the replica servers.

13

2.4.3 Examples

Akamai

Akamai is a large platform that hosts and delivers large amounts of content, helping enterprises improve

the performance, scalability and reliability of their web applications [34].

For distributing content, Akamai’s own servers are used, similar to what was described in 2.4.1, but

it also allows peer-assisted content distribution with it’s own system: NetSession [35, 36]. NetSession

is a software that a user installs in his machine that makes him join a network of users with the same

software that share content between them acting as an extension of Akamai’s global server network.

By allowing peers to distribute content, Akamai relieves their own servers and also protect the QoS

provided by carefully selecting the peers suitable for being distributors. By offloading the content delivery

to the peers, Akamai can lighten the burden on their own servers by 70%-80% [36].

Akamai, by the characteristics of its service, serves notorious clients which include or included Apple,

Facebook, Adobe, airbnb and NASA2.

CoDeeN

CoDeeN [37] is a web proxy and cache server that intelligently distributes requests to their servers to

respond to a potential large number of clients [38].

Deployed on PlanetLab3, the project intentions are to improve performance of the requests over

the network by caching the data of a content provider into nearby proxy servers so that the users can

take advantage of higher bandwidth, availability and network proximity. The content provider using this

service can relieve their servers in terms of costs and possibly have more capacity for handling a large

amount of requests in a short time frame.

When a user requests a resource in the proxy, the CoDeeN system redirects that call to the appro-

priate member of the network that is in charge of the files, using Domain Name System (DNS). That

element that is controlling and caching that file then responds to the proxy with the requested resource

that will be forwarded to the client [38].

2.4.4 Evaluation

One of the main advantages of CDNs is that the services that they provide are specialized in delivering

content and for that they offer a high capacity infrastructure that would be very difficult for a ordinary

company to have. This results in higher availability and lower latency than what a regular host can

provide.

Another good thing is that it is indeed distributed on several locations that can produce faster down-

loads to users, wherever they are located. They also offer quick access cache and allow for simultaneous

downloads.

2http://www.akamai.com/html/customers/customer list.html3https://www.planet-lab.org/

14

The downside of the CDNs is the cost. In order to increase the QoS for the users, the website owner

has to pay. Even though a content provider can save some money by reducing traffic on the original host

server, it will still have to pay the distribution network even if it is a lower value than a self maintained

solution with similar performance.

Also, using CDNs will make websites rely on external entities and some owners are not comfortable

with that. Furthermore, replica servers may not be available at the location it is most needed, as content

distributors usually have their replica server in hot spot locations that are likely to have more requests

but that decision is not tailored to a specific website population.

2.5 WebRTC

2.5.1 What is WebRTC

WebRTC[8] is a relatively new technology that is being supported by all the major web browsers in the

world [39] and that allows for real time device-to-device, mobile or not, communication without the need

of plug-ins, just by using the browser. Since Google released the WebRTC open source project in 20114,

its development evolved rapidly being acknowledged by the World Wide Web Consortium (W3C) and the

Internet Engineering Task Force (IEFT) that began working with Google in the standardization of the

application programming interface (API), creating different channels of communication.

This specification includes getUserMedia to obtain a certain users media streams, RTCPeerCon-

nection to connect browsers and set up voice and/or video calls and also a refinement of the previous

channel, the RTCDataChannel that allows for browsers to share data between them. Several publi-

cations [40, 41] have been released to detail the technology and there are already some applications

developed using WebRTC, mainly taking advantage of the voice and video channels, some of which will

be presented further ahead on this chapter. There are video conferencing solutions [42, 43, 44] and

other applications [45] already built in this technology, even though in early stages.

It is expected WebRTC to evolve, change and grow exponentially in adoption by the developing teams

of websites and applications in the next years, since it’s fit for almost every device on the market, being

Operating System (OS) agnostic. Although some of the functionalities still need solidification, the biggest

obstacle seems to be how to make sure all parties involved to agree, which has already happening with

the video codec divergence5. That major hurdle has to be overcome and for that compromises have to

be made from everybody involved. One thing is for certain, the potential is huge.

2.5.2 How WebRTC works

WebRTC is a set of real-time protocols (Figure 2.6)[46] that are tailored to help in widely used actions on

the Internet. These action are: reading another endpoint media, to access the camera and microphone,

setting up voice and/or video calls and sharing of data via P2P connection. The idea is to make all of

4http://lists.w3.org/Archives/Public/public-webrtc/2011May/0022.html5http://www.tokbox.com/blog/the-eternal-webrtc-video-codec-debate-consensus-finally/

15

this happen without the user having to install some proprietary software or plugins (like Adobe Flash)

and without being aware of its complexity. So the idea is that anybody will be able to use WebRTC since

everybody has a browser in their machine. Also, every connection and transfer made by WebRTC is

encrypted.

As the Figure 2.6 shows, the Web API which is exposed in every browser which gives access to the

WebRTC functionalities to the web developers. The other components are the WebRTC Native C++ API

for the browser makers to implement and then three components: voice engine, video engine and the

transport layer.

Figure 2.6: WebRTC architecture - from [39]

For the purpose of this work, we will focus on the RTCDataChannel since it is the one that allows

for an undiscriminated content sharing and distribution between the users. WebRTC is not a server-less

architecture. Even though it is a decentralized approach, it still needs servers for obtaining a connection

between peers and for signaling.

This technology has capabilities to send and receive either unreliable or reliable communication

between browsers without interference of an external entity, once the connection is established.

Connecting

Before going to the core of the WebRTC connection tools, it is important to explain what is Network

Address Translation (NAT) and what are problems it brings.

NAT operates on a router, connecting two networks together [47], translating the private network

addresses into a globally registered address before packets go to another network. This type of mecha-

nism is used primarily to separate the internal network of endpoints from the rest of the public Internet.

Client server request work very well under these circumstances but that is not the case in P2P commu-

nications. In P2P networks, peers typically connect to each other using a pair of Internet Protocol (IP)

and port that can locate them within the Internet. But the fact is that a peer that is inside a NAT may

16

not really know his real IP and might provide the other peer an unreachable location. Plus, if there is a

firewall involved, the endpoint cannot know if the port it is providing is blocked or not.

To surpass the difficulties of connecting to peers within NAT or behind firewalls, the Interactive Con-

nectivity Establishment (ICE) framework is used and that determines if Session Traversal Utilities for

NAT (STUN) and Traversal Using Relays around NAT (TURN) servers are needed. NAT allows for a de-

vice to have private a IP address that cannot be used on the public Internet and needs to pass through

a routing device, e.g. router. If we have several machines behind a router, how will we know where the

content is and who wants to establish a peer connection? That is where STUN servers come to the

rescue. STUN servers are pretty simple entities with one basic task: check the IP address and port of

a request and send that same address back to inform the requester. So they figure out what is the IP

address of that machine visible from the outside of its network.

Most of the addresses are resolved using STUN servers and established over a Stream Control

Transmission Protocol (SCTP), User Datagram (UDP) or TCP connection. If none of those options

work, the solution is relaying the data between endpoints using the TURN servers. TURN servers have

public addresses and relay the data between two peers. The reason that these servers are not preferred

is because they are more bandwidth consuming than the STUN servers and that can be costly.

RTCDataChannel

The RTCDataChannel allows the establishment of reliable or unreliable data transfers directly between

peers. This has many potential applications like gaming or file transfers and it is a feature that is ready

for use in modern browsers. It allows for the establishment of multiple transfers those transfers can be

configured in terms of reliability and message ordering.

After signaling and resolving each others IP addresses, using the ICE framework, peers can connect

and obtain arbitrary data with low latency and with high throughput [48].

This is easy for the developer as an API is offered that handles all those configurable details and

the communication specifications. The DataChannel uses UDP as a transport layers and on top of the

already referenced ICE framework, as seen in Figure 2.7.

Then has a layer of security with the Datagram Transport Layer Security (DTLS), that is optional

depending on the configuration of the DataChannel to use. Finally, the specific transport protocol SCTP

brings a mixture of UDP features and TCP features. It is used to handle flow and congestion control

as well as providing reliability to the transport of the messages like TCP. But it is a message-oriented

transport protocol, just like UDP. After some initial signaling for connecting, using Session Description

Protocol (SDP), a communication channel is established and data can be sent through that channel,

until the connection is closed.

2.5.3 Evaluation

This technology is revolutionizing the Internet. First of all, it is free. Free for the users and free for the

content providers if they decide to use it on their websites. WebRTC doesn’t depend on a operating

17

Figure 2.7: WebRTC DataChannel protocol stack

system or on a type of device. It needs a browser which almost every machine has nowadays. Another

advantage is that there is no intermediary when sharing content. The content is exchanged only between

the ones that have requested it and that eliminates unnecessary calls or requests and maximizes transfer

rates.

The disadvantages of WebRTC is that as it is a standard still in development, it may change and

for that it is still not widely used. The biggest corner to bend in the use of WebRTC is the usage of

the resolving servers using the ICE framework. There are some free servers available but the usage of

TURN servers, that consume bandwidth may come with some costs.

2.6 Technology comparison

All the technologies presented in the previous sections serve different purposes. The client-server ar-

chitecture on a HTTP server is used throughout the Internet to present websites to the public, in order

to make services available, publicize companies or allow games to be played.

The P2P architecture, more specifically, the relatively popular BitTorrent, is used by a different audi-

ence. People that want files that other users have, resort to this technology to obtain it, downloading it

faster and more reliably than from a single server. The other side is that it requires a client program and

some knowledge (even though there are things coming up like WebTorrent [49]).

CDNs are just systems that are specialized in serving content and are used to make the website

18

experience better for the user. This is a solution for content provides that lease shared server capacity in

order to serve their files along side other content providers at a lower cost than it would be if they serve

it by themselves.

WebRTC is a new technology, available on modern and popular browsers, that allows for browser-

to-browser communication. Even though some CDNs already use P2P communication, WebRTC takes

decentralization on the websites to a new level, where users of that same website can communicate

between them an also help spread the content.

The table presented next is intended to make a theoretical analysis of the requirements presented on

2.1 and how straight forward is to meet these conditions using the previously presented architectures.

HTTP Server BitTorrent HTTP Server +CDNs

HTTP Server +WebRTC

Cost It is an accessiblesolution. Band-width costs mayincrease with thenumber of users.

Free to spreadthe data andmany open sourcesolutions out there.

Costs also dependon traffic and band-width consumption.

Potential to relieveserver in a large partof the data transferedwithout any extra cost.

Reliability&Availability

The Server is acentral point of fail-ure.

Less dependent ona central entity butthe peers may nothave all the parts ofthe whole file.

CDNs are com-posed by serverswith almost contin-uous uptime andmany mechanismsof replication.

The HTTP server isthe central point of fail-ure.

Scalability Can be scaledwith replication orother alternatives.Comes with a cost.

The more peerswant the file themore peers canserve parts of thefile.

Traditional CDNscan scale and han-dle a large numberof requests.

Each peer on the pagehas the potential toserve most of the con-tent. So the morepeers are on the pagethe bigger the poten-tial to respond to re-quests.

Best use Good for small so-lutions.

Good to spreadstatic contentrapidly without thelogic of a website.

Good for largerwebsites thatneed to handle alarge number ofrequests.

Has the potentialto serve small andmedium sized web-sites with solid resis-tance to flashcrowdsand good scalability/-costs relation.

Table 2.1: Comparison between solutions

2.7 Existing contributions

This section will present some solutions that try to distribute content using a P2P network of users by

connecting through a browser and WebRTC.

19

2.7.1 Maygh

Maygh [50] is a system that allows the creation of a browser based content distribution network, using

WebRTC. Developed to be a proof of concept, it is a self scalable system that distributes the cost of

serving web content across users of a certain website.

It is available to users without any plug-in installation and it is loaded when the website is loaded,

only having to communicate with a central coordinator system that provides peer registry and security

measures. There is a fall back mechanism where users with browsers that do not support WebRTC can

navigate the website the same way, as the webserver will provide the content, replacing the users.

According to the studies made, Maygh will allow for a significant central server bandwidth cost reduc-

tion when applied to a website with considerable static content. Tested with data and statistics from the

Etsy6 website, it calculated that in that particular case and with only one coordinating server, savings in

the range of 70% data of the bandwidth usage were obtained.

Currently, the project is no longer under active development.

2.7.2 PeerCDN

Another browser based CDN is PeerCDN. PeerCDN [51] is made in pure JavaScript and was aquired

by Yahoo in December 2013. It creates a peer-to-peer network with the browsers of the website users

and allows, according to their documentation, for a bandwidth reduction up to 90%. Using the WebRTC

DataChannel for file transfer and storage and a central security server for authentication of the content

using hashes of the resources. It presents fallback solutions for slow or unreachable peers.

2.7.3 Swarmify

Swarmify [52] is a commercial solution for creating a users network for sharing a websites’ contents,

with cache. It is mainly focused on video resources and streaming. Swarmify (formerly known as

SwarmCDN) presents a series of extra features to speed up the content download. They state that a

reduction of 75% or more in bandwidth costs is possible, depending on the amount of users that access

to the content.

2.7.4 Viblast

Also a commercial solution and also focused on video streaming is Viblast [53]. Viblast is very similar to

Swarmify (2.7.3), presenting itself as a Peer-accelerated Delivery Network (PDN), with proprietary and

patent pending transport layer. This solution supports a variety of video codecs and to install it on a

website some includes are needed on the page and the video will play on Viblasts’ own video player.

6https://www.etsy.com/

20

2.7.5 PeerJS

PeerJS [54] is an open source solution that wraps the browser’s WebRTC implementation to provide a

complete, configurable, and easy-to-use peer-to-peer connection API. It can be extended to add extra

functionalities and comes with its’ own coordinator server, that also allows for configuration and person-

alization.

It does not create a peer network by itself but allows for an easy connection that can be used in more

complex behaviors using WebRTC.

2.7.6 Peer5 / Sharefest

Peer5 [55] and spin off product Sharefest [56] are open source software. Sharefest was created by

Peer5 at the SV DevFest 2012 hackathon (San Jose) and it is a one to many file sharing application

that works on a browser tab. The user places a file on a page and shares the link, generated on the fly,

with other users so that they can download the file, connecting via P2P to the original user’s browser

tab, with maximum speed and without the need of uploading the file to any server. It takes advantage of

WebRTC, so to work it needs a HTML 5 ready browser.

Peer5 is a more generic approach. It is a serverless CDN that uses WebRTC to share and serve

content (videos, images and other files) between the current users of the a website. It also falls back to

traditional servers when there are no peers or the technology is not available to a user. File verification

and pause/resume mechanisms provide reliability to the transfer and the security of sensible data is

provided by end-to-end encryption. Gains in speed and in scalability are offered as the peer connections

are only used when they improve the speed in comparison with the web server.

2.8 Comparative study

The table below compares the products described above in some features that are considered important

for this work.

The features are:

• Open Source - open development and allows for a better understanding of the architecture of the

solution. It also allows for royalty-free usage and enables improvements.

• WebRTC - uses WebRTC to accomplish the goals.

• Single File Share - allows sharing of a single file between users.

• Video Stream - streams video content to peers.

• P2P CDN - allows for usage on website to create a network of peers to distribute the content of

the page.

• Stable - is in a state that allows for production usage.

21

• In development? - in active development and improvement.

Product OpenSource

WebRTC Single FileShare

VideoStream

P2P CDN Stable? In develop-ment?

Maygh X X XSwarmify X X X ?Viblast X X X ?PeerCDN X XPeerJS X X X XPeer5 X X ? X ?Sharefest X X X ?

Table 2.2: Comparison between WebRTC solutions

Some of these solutions have very interesting traits and already make use of the recent WebRTC

technology. Peer5 seems to be a solution that does some degree of content sharing over websites even

though it is difficult to analyze because it is a commercial product. Maygh would also be a possibility

but because its development is stopped it has already been out passed by the ever evolving WebRTC.

PeerCDN could be also an option to test but its acquisition by Yahoo locked access to the solution.

The conclusion of the analysis is that no solution presented has all the characteristics to use for

testing on this work and because of that a new solution is required and, if possible, integrating some

products described above which already approach the WebRTC side but lack the logical components.

22

Chapter 3

Architecture

This chapter describes the P2P CDN architecture. It begins with the description of the functional and

non-functional requirements that we intend to meet and then the description of the general architecture

and the details on each of the modules.

3.1 Non-functional requirements

Self scalable: The solution must perform adequately and maintain Quality of Experience (QoE) even

when it is accessed by many users with WebRTC capabilities.

Reasonable transfer times: Have an acceptable setup time for the WebRTC mechanism and to have

transfer times between the clients that do not go over twice the time required to download directly

from the server.

Choose the peers wisely: Related to the previous requirement, the choice between requesting content

from the peers or from the server has to be intelligent so that it does not affect the QoS so much.

Reduce server load: The intention is that the server handles less content requests by taking advantage

of the peer network.

Security: Making sure the users download the original files, both for correctness of the web page and

the security of the users, is crucial. A file verification mechanism is needed to ensure this.

Up to date files: It is important that the files are up to date. The solution requires mechanisms that

ensure that outdated content among the network is ignored, even before it is transfered to other

peers.

3.2 Functional requirements

Allow for P2P resource transfer: The main functionality of the prototype is to allow for a client that

reaches a certain web page to gather the resources included in that page, like images, videos or

23

additional scripts, through a WebRTC connection with another client that is on the same page and

already has the resources.

Only peers on the page are available for collaboration: Only the peers that are on the same page

can collaborate and exchange resources. Eventually, to widen that option to a whole domain and

all the paths of a website is a possibility.

Client transparency: The client must use the technology to his advantage or advantage of other with-

out installing anything and without having the perception it is gathering or sending information to

other clients.

Fall-back mechanism: The solution must have a fall-back mechanism for those browsers that do not

support WebRTC. If the clients browser does not support the technology, then regular HTTP

methods come into place, automatically.

Failed connection or transfer tolerance: If for some reason a connection between peers or the trans-

fer of data fails the application must, silently, request the resources from the original server.

3.3 Architecture design

The architecture of the solution was designed to be easily integrable with any web stack in the Internet.

It is composed of two major parts:

• Server side components

• Client side components

The architecture of the P2P CDN, displayed in figure 3.1, is composed of client nodes, that can be

either providing or obtaining content, of a peer coordinator and a web server. These two nodes on the

server side, the Coordinator and the HTTP server, can be together or separated, depending on the

type of relationship it is intended for them to have. Detaching the coordinator from the HTTP server can

allow it to provide its service to a variety of websites. On the client side, the most important parts are the

local modules Content Loader and Peer Connection API. The Peer Connection API module will control

the connections between peers and the Content Loader module will handle everything that has to do

with the Document Object Model (DOM) and decide if it needs to use the WebRTC capabilities or just

request things from the origin server.

This first design of the architecture was achieved taken into account the usage of the WebRTC

technology. This included the interaction, Figure 3.2, between the various participants in the solution.

The most simple interaction between two users is represented in the picture and described in the list

below:

1. Peer 1 requests and gets the HTML page from the Web Server and the script files (for peer

connection).

24

Figure 3.1: Proposed solution architecture

2. Then Peer 1 registers with the Coordinator, getting an assigned ID.

3. For each resource it needs, Peer 1 asks the Coordinator for the list of peers that contain that

resource. Since it is the first/only one in the page, an empty list is returned.

4. Peer 1 then requests the resources to the static file server.

5. After obtaining each resource, Peer 1 announces to the Coordinator that it now can serve that

specific file.

6. Later, Peer 2 makes a request for the same page as Peer 1, obtaining the HTML page and the

initial JavaScript files.

7. Peer 2 registers with the Coordinator, getting an assigned ID.

8. For each existing resource that it does not have, Peer 2 requests the Coordinator for the list of

peers that contain that resource. It obtains a list with one entry, Peer 1.

9. Peer 2 connects with Peer 1 and requests the files it needs. Peer 1 sends the files through a

WebRTC Data Channel. Peer 2 Inserts those resources, e.g. images, into the page.

10. Peer 2 announces to the Coordinator that it now can also serve the files it received from Peer 1.

The second part of the interaction, a more detailed one, where a client comes into scene and con-

nects to Peer 1 is represented in Figure 3.3. The Content Loader will handle all logic related to the

resources that are loaded and the ones that still need to be fetched and communicates, when it needs to

connect to a peer, with the Local Connection API which will handle everything related with the WebRTC

25

Figure 3.2: Solution components interaction

connections and data transfers. The example displayed is assuming that client is using a web browser

with WebRTC capabilities and that no network errors occur. If for some reason, anything involving the

connection or the transmission of data between peers fails, the request of the image immediately would

fall back to a server request.

3.4 Components

3.4.1 Server side components

The server side, represented in Figure 3.4, includes the two components described including the tradi-

tional file request handling part, that distributes HTML and other static files, and the coordinator part that

acts as a connection broker for the peers.

The static file handling component would serve just the files listed in the public directory. In a more

advanced web server, additional features could bring dynamic responses to the client but, for the sake

of this work, a regular and commonly used web server will be used.

Peer Coordinator

The peer coordinator must record which peers are available in the network, identify a peer when another

peer asks and answer with the details needed to establish a connection between them. Without the

coordinator, clients cannot know what peers are available for connection. A central entity for handshaking

of connections is necessary and to avoid distributing sensitive information, like IPs, between the peers.

If the list of peers would be distributed it would bring some reliability issues as a part of that information

could disappear with the disconnection of peers.

The coordinator is responsible for providing simple functions to orchestrate the connections and allow

26

Figure 3.3: P2P CDN participants interaction

Figure 3.4: Server side architecture

for a reliable, up-to-date list of connected peers. Also, the coordinator needs to know and update every

time a user disconnects and is no longer available for connections.

Besides the orientation of the clients and in order to make the peer network functional, the Coordi-

nator needs to link content to peers that have it and vice versa.

Finally, peers that had just connected or need to get a certain resource have to request the connec-

tion broker for the list of peers that have that file.

In summary, modifying the coordinator involved the creation of the following functionalities:

• Register a peer

27

• Register peer as a source of a certain file

• Peer connection handshake

• List peers with a specific file

• Remove peer as a source of a certain file

When a user reaches a web page, it will also register itself with the coordinator, that gives it an ID.

This identification is valid for the session that user has in the web page and is a way of identifying itself

with the coordinator and also how the other peers can reach them.

When a client wants to connect to another one it will send a signal, using SDP, to the coordinator

with the ID of the peer he wants to connect to and the peer ICE information. Then the Coordinator warns

the other peer, forwarding the signal received. The other peer will reply to the Coordinator with its own

network information, which forwards to the original peer.

For organizing the peers, some structures are needed in the coordinator so that it can channel peers

to the source of the files they request.

When a peer registers with the coordinator, initial structures are to be created in three tables. One

table has all the connecting information needed like the peers ID, its IP address and the context where it

belongs (in case of different websites) that helps the peers get connected between each other. An entry

is added to this table when a peer registers with the coordinator, ideally when it initially loads the web

page.

Another table that is very important is the table that maps the existing files that can be shared

between the peers with the actual peers that have them. When a peer announces to the broker that it

has become a server of a specific file, it is added to the list of peers that exist on that table. The same

way, when a peer exits the network or no longer can serve the files, it is removed from all the lists of the

files. An event when a client leaves the page is triggered warning the Coordinator that the peer is leaving

and if it does not respond any message during some time it is considered disconnected and removed

from the network.

A simplified example of this is represented in the table 3.1 where the four resources that exist are

images and are represented by a unique ID, or hash, that distinguishes them from the other files. Asso-

ciated with each file is a list with the peers that are sources for that file, identified with their IDs.

File Hash Peersimage1.png iCwidI [ Peer 1 ]image2.png wbZapD [ Peer 1, Peer 2 ]image3.png Wf9MPs [ Peer 2 ]image4.png 8M2MFa [ ]

Table 3.1: Coordinator - Example table that maps files to peers that have them

The third table needed is the reverse table, that functions as a helper for removing peers from the

network. This table maps all the peers present in the network with the resources they carry. When a peer

registers with the coordinator an entry on this table is created with an empty list. As the peer announces

28

that it received some files that it can share, they are added to the list. At the time that the peer exits

the network, the coordinator gets the list of files that was associated with it and uses it to remove the

peer as a source for every file that was there in the File-to-Peer table explained earlier. Then the entry

is removed from the table.

All these tables are synchronized within the coordinator and evolve as the peers come and go or

when resource sources change.

Additional information can be saved on the coordinator, related to the peers, that can improve the

quality of the transfers between the users thus improving the QoS of the website. One example of

extra information that can be saved is the location of the peer that can be used to match peers that are

closer to each other and then reduce the RTT. Another type of details that the coordinator can store

is the bandwidth each peer has available. This is specially important to establish some priority about

the recommended peer for the data transfer. Even if a source peer is further away geographically than

another potential source peer, if it has a better bandwidth to transmit the files it might just be a better

option.

3.4.2 Client side components

In the client side, the browser must be able to contact the coordinator, connect to the peers and, of

course, send and receive data from others. But this has to be done according to some logic that deter-

mines what resources it needs to fetch and decide what is the target of that same request, either the

server or another peer. So, for the client side there are two main modules (figure 3.5) that have distinct

purposes:

• A connection API for the WebRTC functionalities that wraps all the complexity of it and solidifies

it in simple functions.

• A Content Loader that interprets the DOM, decides what resources are going to be loaded and

cooperates with the connection API when required.

Both of them will be client side modules executable on the browser and will be described in the

following sections.

Connection API module

It is essential to have a solid module that communicates with the coordinator and also that wraps the

creation of a WebRTC connection between two peers in a easy to use function. Being a recent API and

still in development there are different actions that need to be made, depending on the browser1.

First of all it is required to decide on a STUN server to use. Google provides2 a selection of free

STUN servers that can be used.

If the STUN server is not able to resolve the address of a peer, as explained in 2.5.2, there are two

options. Either the application resorts to a TURN server or it will fall-back, eventually, to the HTTP GET1https://developer.mozilla.org/en-US/docs/Web/API/WebRTC API/WebRTC basics#Shims2https://code.google.com/p/natvpn/source/browse/trunk/stun server list

29

Figure 3.5: Client side architecture

from the web server. Since using a TURN server incurs in some costs, it is decided that the second

option will be executed in case of IP address resolution fails.

Considering the address was resolved the peer now needs to, as depicted in Figure 3.6, signal

the coordinator that it wants to connect to another peer, providing its ID and the ID of the target peer.

This signaling is the action in which the coordinator is the most needed both by functional and security

reasons. To signal the participants in the network, SDP is used. SDP is a session initialization protocol

that sends parameters that help to set up connections, announcing the intentions of the sender. The

offer is made by sending a message that documents the intentions and informations needed to establish

a session on a WebRTC DataChannel, just like the example shown in Code 3.1. The answer to that

request is done carrying similar information but transports details about the destination peer that was

the receiver on the first offer Code 3.2.

Once the two peers know the information about each other they can connect and start sending

content through the freshly opened DataChannel, using the type determined by the contract they have

established.

{

” type ” : ”OFFER” ,

” payload ” : {

” sdp ” : {

” type ” : ” o f f e r ” ,

” sdp ” : ” v=0\ r \no=− 2225868466582648380 2 IN IP4 127.0 .0 .1\ r \ns=−\ r \

nt =0 0\ r \na=msid−semantic : WMS\ r \nm= a p p l i c a t i o n 9 DTLS/SCTP 5000\ r \nc

=IN IP4 0 .0 .0 .0\ r \na=ice−uf rag : atmcE9WAr / I t9sD+\ r \na=ice−pwd :

K3p70kYwiOabtt2XMM0q1 /TC\ r \na= f i n g e r p r i n t : sha−256 3A:FE:35 :24 :44 :62 :6

F : A2:8B:89 :13 :5B:08 :8F :34 :5D:BA:38 :2D:AB: 8 4 :C8 : E7 :DD:4B: 1 1 : 4 2 :D5: 2 8 :

30

Figure 3.6: WebRTC connection establishment

A8 :D1:20\ r \na=setup : actpass\ r \na=mid : data\ r \na=sctpmap :5000 webrtc−

datachannel 1024\ r \n ”

} ,

” type ” : ” data ” ,

” l a b e l ” : ” dc a l9 f3 l uy f46g8pv i ” ,

” connect ion Id ” : ” dc a l9 f3 l uy f46g8pv i ” ,

” r e l i a b l e ” : t rue ,

” s e r i a l i z a t i o n ” : ” b inary ” ,

” browser ” : ” Chrome ”

} ,

” ds t ” : ” kav7yzq12mjnstt9 ”

}

Code 3.1: Example of a SDP offer message for a WebRTC DataChannel connection

{

” type ” : ”ANSWER” ,

” s rc ” : ” kav7yzq12mjnstt9 ” ,

” ds t ” : ”04ex8xmw572pgb90 ” ,

” payload ” : {

” sdp ” : {

” type ” : ” answer ” ,

” sdp ” : ” v=0\\ r \\no=− 698775726372932312 2 IN IP4 127.0.0 .1\\ r \\ns

=−\\ r \\ nt =0 0\\ r \\na=msid−semantic : WMS\\ r \\nm= a p p l i c a t i o n 9 DTLS/

SCTP 5000\\ r \\nc=IN IP4 0 .0 .0 .0\\ r \\nb=AS:30\\ r \\na=ice−uf rag :9

bq1nA4OLF5IqJ8Y\\ r \\na=ice−pwd : TqCoyRbgSTSb5cbDlDl2+vrE\\ r \\na=

f i n g e r p r i n t : sha−256 3A:FE:35 :24 :44 :62 :6F : A2:8B:89 :13 :5B:08 :8F :34 :5D:

BA:38 :2D:AB: 8 4 :C8 : E7 :DD:4B: 1 1 : 4 2 :D5: 2 8 :A8 :D1:20\\ r \\na=setup : a c t i v e \\

31

r \\na=mid : data\\ r \\na=sctpmap :5000 webrtc−datachannel 1024\\ r \\n ”

} ,

” type ” : ” data ” ,

” connect ion Id ” : ” dc\ a l 9 f 3 l u y f 4 6 g 8 p v i ” ,

” browser ” : ” Chrome ”

}

}

Code 3.2: Example of a SDP answer message for a WebRTC DataChannel connection

Content Loader

The content loader is the module that will implement all the logic, that determines whether a resource is

already loaded, were is it going to be loaded from, opens and manages the peer connections.

Assuming that the content loader and the whole P2P CDN solution would only load images from

peers, the action flow when a user loads the web page will be as follows.

First the module needs to prevent the browser from fetching any of the images on the document.

Only images that are already cached on the browser can load since it is a local request. For each

image that did not load from cache, the peer needs to ask the coordinator what are the seeders in the

network for that file. If there are no seeders in the network, besides the original server, then a regular

HTTP request is made to the static file server and that image loaded into the HTML document. But if,

in fact, suitable peers are found that can provide that content, an attempt to connect is made. Once the

connection is established, depending on the size of the image, one request can be made to receive the

whole image from one peer or split the data into parts and request it in parallel from several sources.

When the data arrives it will be reassembled, if needed, and its correctness and authenticity is verified

with the coordinator, using checksums3 over the content. If the image is the same as the original it is

placed in the DOM and displayed for the user. After loading each resource, the coordinator is notified

that this peer now belongs to the list of seeders of each of the successfully loaded image.

There will be some mechanisms included to avoid failure when the connection between peers expe-

rience some errors. The client will request the resource to the server when:

• The browser has no support for WebRTC.

• It wasn’t able to connect to the source peer.

• Request the content to the source peer but it takes too long to start receiving it.

• Data transfer between the peers is too slow.

• The data received has errors or is not the same as that served by the webserver.

Finally, there will be listeners that, when the browser window closes, will warn the coordinator that

the peer is no longer available for connection. In case of browser or computer crash or connectivity3https://en.wikipedia.org/wiki/Checksum

32

issues the peer is removed from the list of connected entities when it does not reply to a message for

more that 5 seconds, having to register again with the coordinator to reintegrate the network.

33

Chapter 4

Implementation

4.1 Introduction

This chapter describes the steps taken to implement the prototype of a P2P CDN using WebRTC. The

decisions about the programming language, development environment and requirements to implement

will be explained in the next sections.

4.2 Implementation Options

In order to implement the prototype, two scenarios of development were considered.

Since there was not, at the point when this work was created, a solution or framework that could

create a network of peers for a certain website, the first scenario was to build everything from scratch.

This option, together with the development of the proof of concept, would bring much implementation

effort in areas that involve the establishment of WebRTC connections and data exchange between the

peers.

The second option was to use and extend an existing WebRTC connection API, PeerJS, that already

handles the core functions of the framework. PeerJS is an open source solution, presented in 2.7.5,

that wraps the details of the WebRTC into well built functionalities that simplify the connection process.

This solution also comes with an optional connection broker, also developed in JavaScript, that can be

deployed on a server as an alternative to using their own cloud hosted version.

Since the goal of this work is to prove and test the viability of a peer network for distributing the

contents of a web page, abstraction from the core functions of the WebRTC is acceptable. Because

PeerJs, being open source, allows the modification of the coordinating server to extend its functionalities,

this seemed an obvious choice to create a solution worth being tested.

According to some studies1 [57, 58] images are the most predominant resource type on the average

web page, taking from 50% to 70% of the whole web page content. Taking this in mind, the prototype

was implemented supporting only images as this provided the best return on investment.

1http://httparchive.org/interesting.php, http://www.soasta.com/blog/page-bloat-average-web-page-2-mb/

34

It was decided that whenever a peer is found that has a certain resource, the P2P transfer is always

preferred. So no choice of peers is done since all are considered suitable or optimal for connecting.

If there is more than one peer available to serve the file requested, then it will be chosen at random.

The idea is to force the occurrence of data transfers between peers so that, when testing the prototype,

geographical proximity can be one of the variables.

The unique identifier of the images, in this prototype, was defined by their Uniform Resource Locator

(URL) or path. An hash is generated based on the location of the image and stored in the coordinator

with that same ID. No authentication mechanism for the files was implemented and no verification of

the data integrity is done. It is assumed that all peers are well intentioned and that no tampering with

the data is performed. No verification of the modification date of the file is made, the HTTP server is

considered to provide files that are not modified along the time.

PeerJS

PeerJS is composed, as presented, by two parts. The first part is the local JavaScript file that implements

the local functions for each peer, so that it can communicate with the connection broker and can establish

links with other peers. The second part is the connection broker which is the part that registers peer

information and allows for peers to know who they can connect to. This server does not transfer or proxy

any data between peers, it only helps to establish a connection.

Since both these parts have been developed in JavaScript and some modifications were needed to

implement the logic of the peer content network, the choice of programming language was very simple.

Node.js

Node.js2 is an open-source environment for running server side applications that is really gaining users

and popularity3. It is a JavaScript runtime that uses the Google Chrome V8 JavaScript engine and its

goals are to provide a way of building scalable and lightweight applications.

Express.js

Express.js4 is a web framework for node.js that allows developers to build custom web applications with

ease. Since the PeerJS server is already using Express.js, it is easier just to serve the static files using

this framework instead of having two servers running separately.

4.2.1 Environment

For building the web server that would serve static pages and images, for the purpose of testing, the

choice could be any out of the box server like Apache5 or NGINX6. If the choice fell on one of those

2https://nodejs.org/en/3http://apmblog.dynatrace.com/2015/04/09/node-js-is-hitting-the-big-time-in-enterprise-markets/4http://expressjs.com/5https://httpd.apache.org/6http://nginx.org/

35

two, that meant that the connection broker would have to work separated from it. Because the connec-

tion broker is basically an Express.js server working with node.js, it was decided that it would also be

simple to create an Express.js server that made static resources available as well as paths reserved for

PeerJS and WebRTC related actions. This decision has also taken into account that all the development

would be done in JavaScript, including the server version in the form of node.js, an increasingly popular

approach for building web applications.

4.3 Architecture

4.3.1 Server side components

The HTTP server was very easy to build. A simple line, see Code 4.1, in the Express.js framework is

enough to server the static files in a directory. Anticipating the testing, all cache related headers were

removed so that the image would be loaded every single time it existed on the HTML file.

app . use ( express . s t a t i c ( d i rname + ’ / pub l i c ’ , {

etag : fa l se , l a s t M o d i f i e d : f a l s e

} ) ) ;

Code 4.1: HTTP server serving all the files in the public directory

Coordinator

The coordinator was a little more complex. For its implementation a Representational state transfer

(REST) [59] architecture was used, which is fairly native to Express.js, and each path will lead to an

action on the coordinator side. First of all, a path within the HTTP server was created, code 4.2, and

everything that access that path, in this case ’/peerjs’, would be handled by the coordinator. The allow

discovery flag is to make peer list requests available for all peers.

var op t ions = { a l l ow d iscove ry : t r ue } ;

app . use ( ’ / peer js ’ , ExpressPeerServer ( server , op t ions ) ) ;

Code 4.2: Coordinator with a reserved path

The coordinator works with a key that is set so that one coordinator can handle connections of different

peer networks. In this case, only one basic key is used as only one peer network is needed.

In order to make the original PeerJS coordinator meet the requirements established for the coordi-

nator in 3, in needed extra functionalities that allowed content location management.

For that, two new functions were added to the connection broker:

• Get Peers with content

36

• Announce Peer as content source

In the snippet shown in Table 4.1, to ask for a list of peers with a certain content the client must

simply do an HTTP GET with the path /peerjs/key/peers/iCwidI for an image with hash iCwidI.

As a response, the function will return a list of IDs of the peers available. These will identify the peers

with the coordinator that will act as a guide to establish links.

Path Function Parameters

/:key/peers/:contenthash Lists peers witha certain content

:key - ID of the peer network:contenthash - Hash of the file

/:key/content/:id/:contenthashAnnounces a peerwith an :id as a sourceof a certain content

:key - Identifies the peer network:id - ID of the peer that is announcing:contenthash - Hash of the file

Table 4.1: Added REST functionalities on the Coordinator

Then a peer that sends the request, like in Table 4.1, of ’/peerjs/key/content/iCwidI’ to the coordinator

will be saying that it is now a provider for the resource ’iCwidI’. To this request, the response does not

need to be interpreted.

4.3.2 Client side components

Connection API

To make the new functions of the coordinator available to the clients, the PeerJS client script had to be

extended to handle those two entry points. Basically, the connection API has to do an HTTP request to

the path described earlier. Then it needs to handle the errors that might occur and cleanly return to the

client a definitive answer. The function publicly available for the Content Loader is shown in code 4.3.

peer . l i s t A l l P ee r s Wi t hC o n te n t ( contenthash , ca l l back ) ;

Code 4.3: List all peers - PeerJS

If something happens, either no response or an erroneous reply the function always returns an empty

list which will make the peer resort to the origin server.

Content Loader

The content loader is a script written in JavaScript that acts when the DOM is loaded. It starts by

registering the peer, code 4.4, with the coordinator, resorting to the connection API. The host can be an

IP, a URL or blank to use the same host as the HTTP server.

var peer = new Peer ({

host : ’ 52 .18 . xx . xx ’ ,

po r t : 80 ,

path : ’ / peer js ’

37

} ) ;

Code 4.4: Register peer with the coordinator

Then, when the objects in the HTML are accessible, the script cycles through the images and gathers

information about them. Some selector attribute on the HTML has to be placed on images that are to

be part of the distribution system. This selector can be a different class or a override of the src attribute

of the images like in Code 4.5, in this case to avoid the browser to automatically obtain the resources

using default operation.

<img data−src = ” / img / tes t300k 1 . png ” />

Code 4.5: Image selector attribute

When deciding what resources will be requested to the other peers in the network, the browser cache

will be ignored. Ideally the local cache should be accounted for because a local request is better than a

network request and peer already has the content.

It generates their hashes (based on the URL) and uses PeerJS to request the peers with that image

(code 4.6).

peer . l i s t A l l P ee r s Wi t hC o n te n t ( imgOptions . hash , f u n c t i o n ( l i s t ) {

i f ( l i s t && l i s t . l eng th === 0) {

getImageFromServer ( imgOptions ) ;

} else {

connectAndRequest ( imgOptions , l i s t ) ;

}

} ) ;

Code 4.6: Get the list of peers

Since, when handling web requests, asynchronous behavior is expected from JavaScript, some con-

nection configuration is needed in case that any solicitation comes in.

A peer can either receive a request for content or can receive a file that it has requested. So, for

every connection that is established, when data comes into the peer, it has to be prepared for it to be

one of two things:

• Some other peer in the network has just requested some data.

• The data that this peer has requested earlier for itself has just arrived.

In case of being another peer asking for collaboration and if this peer has the content it will simply

send it through the DataChannel.

If the information arriving is instead the response to the request it has made to fetch content from

another peer, then it will have to process the data and prepare to integrate it in the DOM.

38

Here, in Code (4.7), the peer tries to connect to the chosen source of the data. When a connection

is opened it sends the request for the image with that hash. The peer tries to connect, in this prototype,

for 3 times in a 500 milliseconds interval and if no success is attained, then it will get the image from

the server.

conn = peer . connect ( chosenPeer ) ;

( . . . )

conn . on ( ’ open ’ , f u n c t i o n ( ) {

conn . send ({ request : t rue , hash : image . hash} ) ;

}

var tryToConnect = s e t I n t e r v a l ( f u n c t i o n ( ) {

( . . . )

i f ( t imesTr ied === 3) {

c learTimeout ( tryToConnect ) ;

getImageFromServer ( image ) ;

}

} ,500) ;

Code 4.7: Connect and request some data

Once a resource is obtained, either by getting it from the server or from a fellow peer, the content is

always saved in a structure for easy access, should another peer request it.

Ideally it should use the local storage of the browser but that usually has a limit that could fill up

quickly. For the sake of this work, a local variable was selected to do the job, as shown in the example

in Code 4.8.

The downside of storing the gathered elements in a in-memory storage is that it will consume more

resources from the client and will increase linearly with the number of images stored.

” iCw id I ” : {

” i d ” : ” iCw id I ” ,

” datasrc ” : ” / img / 1 . png ” ,

” hash ” : ” iCwid I ” ,

” f u l l u r l ” : ” / img / 1 . png ” ,

” content ” : ” data : image / png ; base64 , iVBORw0KGgoAAAANSUhEUgAABKgAAAN+CAYAA

. . . 8 1 L t+asWzfrtfW32r9k / / 8PV+Imwn3Ulx4AAAAASUVORK5CYII=”

}

Code 4.8: Image in storage

The content of the image itself is saved in base64 so it can be sent immediately after a request is

received, losing no time in the preparation of the data.

39

Chapter 5

Evaluation

This chapter presents the tests performed to evaluate the fulfillment of the goals set for this project.

5.1 Tests Objectives

The tests preformed on the prototype have several objectives:

1. Assess if a WebRTC solution can lower the load on a the websites main server.

2. Determine the impact on download times caused by the usage of the P2P transfers.

3. Evaluate the overhead of having extra network requests for the users.

4. Inspect the effects of having transfers between close peers and distant peers.

5. Appraise the behavior of the network of peers when the number of clients is higher.

6. Examine the performance of the network with resources with different sizes to share.

5.2 Deployment and hardware

The deployment of this prototype was planned to be done in an Amazon EC2 instance, located in

Dublin. It is a common server type for websites and allows for free usage of one year and it is an elastic

computing server that adapts the network capacity will adapt according to demand. This is a T2.micro

instance that has one dedicated CPU core and 1 GB of memory, running a Linux Debian OS. These

specifications are more than enough to deploy a simple web server that serves static files and the extra

features of the coordinator do not consume too much resources. Being in a UNIX OS, it is easy to

configure, manage and prepare the environment in which the server has to run.

5.3 Tests Scenarios

For the server, an Amazon EC2 t.micro instance was used.

40

For the clients, servers from Digital Ocean were used, from different locations: New York, San Fran-

cisco, Toronto, Singapore, London and Amsterdam. They were used to avoid having local peers, on the

same network or data center, which would result in super fast download and upload performance and

no impact would come from distance both from peer-to-peer and peer-to-server.

For every peer that accessed the prototype and integrated the peer network, a connection limit of

10 megabits per second (Mbps) of download and 1 Mbps of upload was established. This was done to

simulate a connection of a user, using their home networks.

To simulate a full-featured browser using a headless machine, software from Selenium, called Web-

Driver, was used. Simulation used an environment with a Mozilla Firefox version 41.0.1, using a virtual

display server, Xvfb1, and Python scripting (see Code 5.1) to send commands to the virtual browser.

# ! / usr / b in / python

# coding= u t f−8

impor t t ime

impor t sys

from selenium impor t webdr iver

ava i l ab lescena r i os = [ ’ S1 ’ , ’ S2 ’ , ’ S3 ’ , ’ S4 ’ , ’ S5 ’ , ’ S6 ’ , ]

p r i n t ’ Running scenar io : ’ , sys . argv [ 1 ]

scenar io = sys . argv [ 1 ]

s leepy = 300

i f sys . argv [ 2 ] :

s leepy = i n t ( sys . argv [ 2 ] )

i f scenar io i n ava i l ab lescena r i os :

u r l = ’ h t t p : / / jdpeer . no−i p . org / ’ + scenar io + ’ . html ’

p r i n t u r l

d r i v e r = webdr iver . F i r e f o x ( )

d r i v e r . get ( u r l )

t ime . sleep ( sleepy )

d r i v e r . q u i t ( )

e lse :

p r i n t ’ Scenario not a v a i l a b l e . Se lec t one from these : ’+ ’ , ’ . j o i n (

ava i l ab lescena r i os )

Code 5.1: Script to run WebDriver

Coordination of the servers were attained using a scheduling command, the at command, that

worked because the machine clocks were synchronized, with one second of error margin. This error

was considered irrelevant to the success of the tests.

All the results presented, except where otherwise stated, were the outcome of 50 tests in the same

conditions, except network conditions, which cannot be predicted.1http://www.x.org/archive/X11R7.6/doc/man/man1/Xvfb.1.xhtml

41

All the following page types were accessed first with a traditional version of a website, downloading

every time the resources from the server, and with the execution of the prototype of the P2P CDN

activated. These two different types of access were executed by the same peers at sequential times, as

shown in the example, until each type of access reached 50 iterations:

• Access page with P2P deactivated.

• Access page with P2P activated.

• Access page with P2P deactivated.

• Access page with P2P activated.

• (...)

The resulting download times were captured by the peers locally and sent to a central logging server

that saved the information collected.

5.3.1 Page types

Three types of web pages with images were created, that would be accessed by the peers in the different

scenarios:

1. Page with four images with 300 kilobytes (kBs) each, totalizing 1.2 megabytes (MBs), referenced

as P1.

2. Page with ten images, nine with 10 kBs and another with 300 kBs, totalizing 390 kBs, referenced

as P2.

3. Page with two images, one with 1.5 MBs and another with 1.8 MBs, totalizing 3.3 MBs, referenced

as P3.

The first web page is based on studies that indicate that the average web page has 1 to 1.5 MBs

[57, 58].

Three scenarios were defined.

5.3.2 Scenario 1 - Close peers

The first scenario is an interaction between peers that are geographically close to each other. Peer1

was a computer in New York and Peer2 was a computer in Toronto. The first peer is to access the page

first, stay on the page and then, one minute later, Peer2 request the HTML page. This scenario intends

to demonstrate that of a wise P2P connection where, in a future version, the choice of peers will matter

and if a nearby peer already has the data there is no need of fetching it across the globe.

42

5.3.3 Scenario 2 - Distant peers

This scenario is when two clients are paired together, with the same timing as the previous scenario

but the peers are now far away, both network wise and geographically. Peer 1 was in New York, which

will access the page first, and followed, a minute later, by peer 2 that was in Singapore. The goal is to

make a comparison with the previous scenario in order to assess the importance of distance in WebRTC

connections.

5.3.4 Scenario 3 - More than two peers

The third and last scenario is when a page is accessed by more than two peers, in bursts. The first

half of the total of six peers will request the page and the remaining three clients will request it a minute

later. The objective of this test is to show that the prototype can behave well in situations where there

are several connections between peers. The peers were located, by order of appearance in the results

table, in New York, Singapore, San Francisco, London, Amsterdam and Toronto.

5.4 Test Results

5.4.1 Scenario 1 - Close peers

Test page 1

Tests done with close peers, but still a bit distant from the server is the best case scenario. For these

tests, on web page 1, which is composed of 4 images with 300 kilobit (kb), the results show (Table 5.1)

that the seeder client suffered from increased download times when WebRTC transfers are active. Even

though this client still gathers the files from the origin server, it still has to ask the coordinator if are there

any peers and this is accounted on the times recorded. This adds a little overhead, as can be seen in

Figure 5.1, even for the clients that do not resort to the network of peers to get their files.

In the graph is also shown the error interval of the mean of the download times for each image, where

a confidence interval of 95% is considered.

But looking at the setup and download times of the second peer (Figure 5.2), compared to the first test

round where P2P was off and Peer 2 got the images from the origin server, you see some improvement.

The biggest improvement can be seen in the File 1 and this can be explained as this is the first data

that it is sent, because of being placed before that the others in the HTML document. Since File 1 starts

to fetch before and the other files transfers are launched some milliseconds after, File 1 gets almost all

bandwidth and finishes in a faster time, hurting the download times of the other resources.

In conclusion of this test, our solution brings an overall improvement on the download times on the

second peer, when the P2P network is activated.

It is observed, as a result, also that Peer 2 is faster that Peer 1 when obtaining files directly from

the server. This is because of the location of the peers, having different network routes provide distinct

results, and the less congested data center of Toronto compared to New York.

43

Peer 1 Peer 2P2P off P2P on Var. P2P off P2P on Var.

File 1 7035 7219 +2.62% 3612 1415 -60.83%File 2 7199 7129 -0.97% 3651 3033 -16.93%File 3 6251 6296 +0.72% 3918 3974 +1.43%File 4 7864 8214 +4.45% 4412 4767 +8.05%avg. 7087.25 7214.5 +1.70% 3898.25 3297.25 -17.07%

Table 5.1: Scenario 1, Page 1 results

Figure 5.1: Scenario 1 - Test 1 - Peer 1 - Download time comparison between WebRTC inactive andactive

Figure 5.2: Scenario 1 - Test 1 - Peer 2 - Download times between server GET and peer GET

Test Page 2

On web page 2 a slightly different HTML file structure is used. More images means more send/request

transfers between the two peers. On the first peer, it can be seen (Table 5.2) that the overhead of asking

44

the coordinator for peers in the network now makes the relative increase on the times to be bigger,

mainly because the images are smaller and the round trip of asking the coordinator remains always the

same. In other words, an increase of 200 milliseconds has more relative impact on a 500 milliseconds

transfer that on a 5 second transfer.

As for Peer 2, a similar effect to that happened with web page 1 can be seen. Concurrency on the

same connection with the transfer of data hurt all the times, except the first one that escapes the swarm

for being the first. Also, the impact from the coordinator requests, that for every resource add some

milliseconds contribute to the increase.

So in this case the times for fetching the resources are damaging and the overall download time for

the items on the page is 1.5 times than if it were downloaded from the server.

This can be solved by limiting simultaneous transfers on peers and getting the files from mixed

sources, like other peers or even the server.


File 1 559 705 +26.12% 362 271 -25.14%File 2 640 874 +36.56% 368 408 +10.87%File 3 635 867 +36.54% 390 655 +67.95%File 4 582 787 +35.22% 361 674 +86.70%File 5 568 808 +42.25% 431 742 +72.16%File 6 725 958 +32.14% 466 842 +80.69%File 7 630 874 +38.73% 453 923 +103.75%File 8 785 998 +27.13% 521 609 +16.89%File 9 739 988 +33.69% 529 573 +8.32%File 10 6515 5862 -10.02% 2822 4558 +61.52%avg. 1237.8 1372.1 +29.84% 670.3 1025.5 +48.37%


Test Page 3

On this page, that contains bigger resources, the results, presented in Table 5.3, are brighter. For the

first client an increase on the times for each file is obtained which can be explained, as said earlier, by

the sum of the requests to the coordinator for the list of peers for each resource. Figure 5.3 depicts the

average download times for peer 1.


File 1 26545 26985 +1.66% 11876 2061 -82.65%File 2 31932 33302 +4.29% 12604 2906 -76.94%avg. 29238.5 31443.5 +2.97% 12240 2483.5 -79.79%


As for Peer 2, savings of almost 80% (see Figure 5.4) indicate that for larger content and with less

concurrent transfers, the P2P data exchange thrives. Smaller RTT and the nature of the WebRTC

DataChannel explains the big improvements. This is specially interesting to apply in big file downloads

or in video streaming, where larger contents are involved. The web server is an elastic instance that

45

Figure 5.3: Scenario 1 - Test 3 - Peer 1 - Download time comparison between WebRTC inactive andactive

allocates resources when it needs them more. The upload capacity is variable and possibly is lower than

the 1 Mbps of the peers, at the start of the transfer, before it adapts and speeds up.

Figure 5.4: Scenario 1 - Test 1 - Peer 2 - Download times comparison between server GET and peerGET

Summary

On this scenario the results obtained validate objectives 1, 2, 3 and 6 described in 5.1. It is possible

to conclude that the prototype can make a user avoid requesting content from the server when a peer

is present although it brings some communication overhead with the Coordinator. In this scenario im-

provements in the download time can be observed, specially on Test Page 3, that come from the shorted

distance between the peers.

Some problems involving simultaneous downloads of content from the same peer were detected,

which on Test Page 2 actually increased the download times.

5.4.2 Scenario 2 - Distant peers

Test page 1

The results for Peer 1 are very similar to those from the first scenario, since this peer is in the same

location, New York. It can be verified that the times with P2P activated are a bit more than when it is

deactivated, as explained before, and this overhead amounts to a 200 300 milliseconds, in this case.

46


File 1 6618 6785 +2.52% 22959 2023 -91.19%File 2 6461 6846 +5.96% 21272 2041 -90.41%File 3 5653 6181 +9.34% 20442 3032 -85.17%File 4 7496 7348 -1.97% 22772 3807 -83.28%avg. 6557 6790 +3.96% 21861.25 2725.75 -87.51%


As for Peer 2, located in Singapore, that is significantly distant from the server (Dublin), increased

server request times can be observed. What a client in New York takes 6 seconds to download, it takes

for that client in Singapore more than 3 times more. When the peer to peer solution was activated, large

improvements can be expected. Singapore is network wise distant from New York as well but does not

require as many hops, since it has a direct link to the San Francisco node. Also, due to the nature of the

WebRTC transfers, specially being in unreliable mode, which makes it faster. Even with unreliable mode

there were only considered for these tests times where the image arrived whole.

Test page 2

The results for page 2, Table 5.5, are interesting. For peer 1 it corroborates the results on the first

scenario for the same page. Some overhead is added in the same degree as the same page on Scenario

1 and can be explained by the same reason.

As for the second client, the one that obtains the files via P2P connection, it is possible to observe

a major benefit in the times taken to obtain the resources. Even though the files are small and the

overhead of checking the list of peers is relevant it is still better to fetch the files using WebRTC that

directly from the server. Although this is theoretically something to avoid, to obtain resources from a far

away peer, the tests show that this is actually beneficial. Reducing load on the original server is always

good but if it comes with a benefit for the client, then it is something worth deploying.


File 1 486 755 +55.35% 767 323 -57.89%File 2 545 918 +68.44% 948 752 -20.68%File 3 543 983 +81.03% 1081 862 -20.26%File 4 670 924 +37.91% 805 878 +9.07%File 5 602 914 +51.83% 958 953 -0.52%File 6 584 1101 +88.53% 1440 1048 -27.22%File 7 531 1037 +95.29% 1190 1140 -4.20%File 8 698 1044 +49.57% 1681 1231 -26.77%File 9 634 874 +37.85% 1777 1699 -4.39%File 10 5985 6864 +14.69% 18136 4896 -73.00%avg. 1127.8 1581.4 +64.21% 2878.3 1378.2 -22.59%


47

Summary

This test scenario was used mainly to validate, as a comparison with Scenario 1, objective 4 in 5.1.

Transfers are actually slower between two distant peers but on this scenario was beneficial as the

server time were even higher.

For this scenario, tests on Test Page 3 were unnecessary and were not included in the test package.

5.4.3 Scenario 3 - More than two peers

Test page 1

This test is a bit different and involves six clients in total. The intention was to create three seeder peers,

by accessing the page simultaneously and then access it with other three peers that would obtain the

content from the seeds. But it did not work as expected and the results reflect that. The first peer (New

York) was closer than the other peers (Singapore and San Francisco), got the data quickly and made

it available by announcing it with the coordinator. Peers 2 and 3 then, at the same time, requested the

HTML file from the server. Because they are farther away, they got it after the first peer. So, when they

requested the peer list for the content, Peer 1 was already available. This lead to great improvements

(Table 5.6) in the times for obtaining the files, even before the second batch of clients accessed the

network.

Peer 1 Peer 2 Peer 3P2P off P2P on Var. P2P off P2P on Var. P2P off P2P on Var.

File 1 1667 1539 -7.68% 24529 1634 -93.34% 22706 2099 -90.76%File 2 1798 1881 +4.62% 24179 3290 -86.39% 22568 2102 -90.69%File 3 1612 1723 +6.89% 19576 3888 -80.14% 19950 3061 -84.66%File 4 1952 2152 +10.25% 25887 4526 -82.52% 23759 3543 -85.09%avg. 1757.25 1823.75 +3.52% 23542.75 3334.5 -85.60% 22245.75 2701.25 -87.80%

Table 5.6: Scenario 3, Page 1 results part 1

When the second run of peers arrived, there were already three peers on the network and got their

images through peer connections. Benefits were obtained by all except for Peer 4. Peer 4 was located in

London, which is actually closer to Dublin. So it is cheaper for the peer to get the content from the server

than from some random peer that can be, with the absence of some smart peer selecting mechanism,

in Singapore. So the majority of the requests for that peer took longer because all of the other peers

were farther than the HTTP server.

Peer 4 Peer 5 Peer 6P2P off P2P on Var. P2P off P2P on Var. P2P off P2P on Var.

File 1 2413 2241 -7.13% 3272 2578 -21.21% 6532 2453 -62.45%File 2 2638 3242 +22.90% 3757 3357 -10.65% 6475 2737 -57.73%File 3 2311 3519 +52.27% 3632 3711 +2.18% 5764 2597 -54.94%File 4 2750 4581 +66.58% 4148 4024 -2.99% 6608 3296 -50.12%avg. 2528 3395.75 +33.66% 3702.25 3417.5 -8.17% 6344.75 2770.75 -56.31%

Table 5.7: Scenario 3, Page 1 results part 2

48

Summary

The intention of this test scenario was to validate 5 in 5.1.

It show that the system behaves very well with more users and that the set up of the peer network is

so fast that even peers that receive the page 1 to 2 seconds later than the first client already have peers

to download content from.

It also showed that a correct selection of the peers to download from would bring benefits. Since

all selection is random, some peers had increases in the transfer times which could be avoided by

connecting to closer peers or go directly to the server.

49

Chapter 6

Conclusions

6.1 Summary

The goal of this work is to show a solution for website owners and developers to transform their systems

into self scalable ones using the power of their own clients. On the websites nowadays, replication of

servers and subscription of CDNs are the mechanisms used for supporting large numbers of users. The

more users a website has the more infrastructure support it has to have. This is costly and this works for

websites that have some sort of prediction about what the near future will be like in terms of popularity.

For smaller websites or companies that are trying to take their web page to the next level, they now

have to scale accordingly so that they can respond to the demands of heavy usage or flash crowds.

Startups usually tend not to spend so much resources on infrastructure because they are small.

If a user next door is browsing a website does it make sense that another user, right beside him

or even in the same city or country, that wants to see it as well makes the same request to the server

of that website? That is where the WebRTC comes in. Browser-to-browser communication, while still

in development, is already a reality. This work intended to make available a open-source solution,

accessible for developers and bringing benefits for both the content providers and the users. This would

allow load reduction on the server and create a community of clients of a particular website, that are

looking at it at the same time and sharing its resources.

In theory and for small files, if clients are closer to each other than they are from the server, a P2P

connection would also increase performance on the data transfers, thus benefiting the users as well.

For larger files, where upload capacity is more important, it is possible to download them using several

peers to obtain parts of the file, much like the BitTorrent protocol but in the browser.

To prove that it is in fact possible, a prototype of a system that can support this was build. Composed

of a regular HTTP server, extended with a coordinator that acts as a connection broker and a reference

to where the resources are located within the network, this prototype uses the very modern WebRTC.

When some user goes on his browser and inserts a URL, the browser contacts the server to fetch the

initial files. Then, available for all the modern browsers, the HTML5 WebRTC kicks in and with the help

of the coordinator determines if there are peers that can help the server serving its resources. The

50

coordinator is a simple security and lookup element that can be deployed using little resources and

that helps peers find each other and establish a connection. Once the peers are connected, any data

exchanged between them does not cross any server. It is an direct connection and that means fewer

network hops, if they are closer than the server, and better performance.

In the scope of this work only a prototype was implemented that did not apply all what was defined in

the architecture like the smart selection of peers and parallel downloads of file parts, which will be future

work.

6.2 Achievements

The conclusion for this work is that a solution like the one it was presented is a very good solution to

implement on all websites.

To accomplish what was identified in 2, an architecture proposal was presented with the requirements

considered needed for this type of system.

Following the architecture, a prototype with a subset of the requirements was implemented to prove

the concept. This prototype was also used to test the concept and the system in some created scenarios.

As the tests have shown, performance improvements were obtained and even more so when only

a simple peer selection mechanism is in place. This would happen in many cases, specially in global

websites that do not need to extend their servers whenever or wherever there are aggregations of users.

Also, being a self scalable solution, the servers would be much more resistant to sudden popularity

bursts while having the same structure costs.

It was demonstrated that with a solution like this, a website would only serve part of the content it

serves now and with that could remove/downgrade infrastructural nodes and reduce costs.

6.3 Future Work

6.3.1 Issues

This solution is not free of problems. There are some that need to be solved. First of all, WebRTC

technology needs to stabilize and have its API become the same for every browser that supports it.

Connecting some peers might bring some issues. Being behind complex NAT systems might insert

some difficulties in connecting users with elements of the P2P created by the prototype.

For solving this there are some solutions, which the owner can select, that can consider the usage

of the original web server or renting a TURN server. Alternatively, the system could try to invert the link

with the peers behind NAT and make these initiate the connection with the peers that are outside NAT.

Then the website owner has to select which one is cheaper and provides better performance and

results to that part of the peer population.

Another problem than can arise is the bandwidth of the peers. Internet Service Provider (ISP)s for

home use are evolving in terms of speed but they are not yet as fast as the bandwidth present on the web

51

servers today. The advantage is that the users are always in greater number than servers of a website.

Peers can collaborate and send parts of a file into another peer, maximizing their bandwidth usage for

that part, and by aggregation surpass the upload capabilities of a web server. Also the capacity of a user

is not similar of that of a server and can only handle a fraction of the requests that a server can handle.

This requires some kind of mechanism to select users that provide better bandwidth and conditions to

share the resources without hurting the response times in a website. With some load balancing between

the peers, the system can avoid a user to be swamped with requests and the coordinator can distribute

in a fair way the demand on the peers.

Big files can be requested in parts simultaneously from several peers, maximizing performance. After

receiving the file, peers could submit their download time statistics and source peers to the Coordinator

that then could intelligently assess who are the fastest peers, who sends corrupt data or those who have

uploaded more data.

There are already some plug-ins for the browsers that allow users to block the WebRTC DataChan-

nel1. This would prevent the network to function for those users and that is a problem that would limit

the P2P available senders/receivers, leaving the server to do all the work.

6.3.2 Evolution

This prototype can evolve by adding some extra features.

The first feature would be to add security and reliability to the transfers. By adding this, it would be

possible for certain users to share content that is more sensible. These data transmissions would not

pass through any intermediary server and would be encrypted and thus more secure.

Another feature that can improve the system is to allow the coordinator to maintain a record of the

contents that are up to date and discard peers from seeding content that is not the latest. This is very

important so that old content does not stick to the network and ensure that peers that access the website

never get outdated resources.

Another option is to extend and make it even smarter, to understand the limits of every peer and to

put in place some mechanism of justice so that all the peers contribute. The Coordinator could also

store the statistics of all transfers to make better and informed decisions when selecting the best peers

to connect to a peer that requests the list.

The prototype also lack a security functionality that checks whether a file is the correct file and make

sure it was not tampered. This directly connects with the feature above. Using checksums on the

content would allow the peer to verify if the seeder that provided him the content is handling him either

erroneous, altered or old content.

Another idea to implement on the system is a reward mechanism for helping users. Peers that served

content on the website could integrate a payback system that would compensate them for sharing the

content. This would avoid blockage of the WebRTC features and motivate users to collaborate. The

compensations could include faster access and priority in the use of the servers.

1https://chrome.google.com/webstore/detail/webrtc-block/nphkkbaidamjmhfanlpblblcadhfbkdm?hl=en,https://addons.mozilla.org/en-us/firefox/addon/happy-bonobo-disable-webrtc/

52

Bibliography

[1] W3C: World wide web history. http://www.w3.org/History.html

[2] Cisco: Cisco visual networking index: Forecast and methodology. Technical report, Cisco

(2013–2018)

[3] Barford, P., Plonka, D.: Characteristics of network traffic flow anomalies. In: Proc. Internet Mea-

surement Workshop (IMW). (2001)

[4] Peng, G.: Cdn: Content distribution network. arXiv preprint cs/0411069 (2004)

[5] Scellato, S., Mascolo, C., Musolesi, M., Crowcroft, J.: Track globally, deliver locally: Improving

content delivery networks by tracking geographic social cascades (March 2011)

[6] Shakkottai, S., Johari, R.: Demand-aware content distribution on the internet. Networking,

IEEE/ACM Transactions on (2009)

[7] Bronzino, F., Gaeta, R., Grangetto, M.: An adaptive hybrid cdn/p2p solution for content delivery

networks. In: Visual Communications and Image Processing (VCIP), 2012 IEEE. (2012)

[8] Bergkvist, A., Burnett, D.C., Jennings, C., Narayanan, A.: Real-time communication

between browsers. In: WebRTC 1.0. W3C WebRTC Working Group (2011-2015)

http://www.w3.org/TR/2015/WD-webrtc-20150210/.

[9] Offutt, J.: Quality attributes of web software applications. IEEE Software 19(2) (2002)

[10] Steinmetz, R., Wehrle, K.: Peer-to-peer systems and applications. Volume 3485. Springer Science

& Business Media (2005)

[11] Stumm, C.: Client-server system for delivery of online information (June 16 1998) US Patent

5,768,528.

[12] Fielding, R.T., Taylor, R.N.: Principled design of the modern web architecture. ACM Transactions

on Internet Technology (TOIT) 2(2) (2002)

[13] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T.: Hypertext

transfer protocol–http/1.1. Technical report (1999)

[14] Postel, J.: Transmission control protocol. (1981)

53

http://www.w3.org/History.html

[15] Oram, A.: Peer-to-peer: harnessing the benefits of a disruptive technology. ” O’Reilly Media, Inc.”

(2001)

[16] Bit Torrent, Inc.: Bit torrent. http://www.bittorrent.com

[17] Steinmetz, R., Wehrle, K.: 2. What Is This “Peer-to-Peer” About? Springer (2005)

[18] Stokes, M.: Gnutella. http://rfc-gnutella.sourceforge.net/

[19] John Fanning, S.F., Parker, S.: Napster. http://mp3.about.com/od/history/a/

The-History-Of-Napster.htm

[20] eMule-Team: emule. www.emule-project.net/

[21] Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer

lookup service for internet applications. ACM SIGCOMM Computer Communication Review 31(4)

(2001)

[22] Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-

scale peer-to-peer systems. In: Middleware 2001, Springer (2001)

[23] Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable

network. Volume 31. ACM (2001)

[24] Maymounkov, P., Mazieres, D.: Kademlia: A peer-to-peer information system based on the xor

metric. In: Peer-to-Peer Systems. Springer (2002)

[25] Qiu, D., Srikant, R.: Modeling and performance analysis of bittorrent-like peer-to-peer networks.

In: ACM SIGCOMM Computer Communication Review. Volume 34., ACM (2004)

[26] Pouwelse, J., Garbacki, P., Epema, D., Sips, H.: The bittorrent p2p file-sharing system: Measure-

ments and analysis. In: Peer-to-Peer Systems IV. Springer (2005)

[27] Neglia, G., Reina, G., Zhang, H., Towsley, D., Venkataramani, A., Danaher, J.: Availability in

bittorrent systems. In: INFOCOM 2007. 26th IEEE International Conference on Computer Commu-

nications. IEEE, IEEE (2007)

[28] Jung, J., Krishnamurthy, B., Rabinovich, M.: Flash crowds and denial of service attacks: Char-

acterization and implications for cdns and web sites. In: Proceedings of the 11th international

conference on World Wide Web, ACM (2002)

[29] Stading, T., Maniatis, P., Baker, M.: Peer-to-peer caching schemes to address flash crowds. In:

Peer-to-Peer Systems. Springer (2002)

[30] Gadde, S., Chase, J., Rabinovich, M.: Web caching and content distribution: A view from the

interior. Computer Communications 24(2) (2001)

[31] Hofmann, M., Beaumont, L.: Content Networking: Architecture, Protocols, and Practice. The

Morgan Kaufmann series in networking. Morgan Kaufmann (2005)

54

http://www.bittorrent.com

http://rfc-gnutella.sourceforge.net/

http://mp3.about.com/od/history/a/The-History-Of-Napster.htm

http://mp3.about.com/od/history/a/The-History-Of-Napster.htm

www.emule-project.net/

[32] Mulerikkal, J.P., Khalil, I.: An architecture for distributed content delivery network. In: Networks,

2007. ICON 2007. 15th IEEE International Conference on, IEEE (2007) 359–364

[33] Androutsellis-Theotokis, S., Spinellis, D.: A survey of peer-to-peer content distribution technologies.

ACM Computing Surveys (CSUR) 36(4) (2004)

[34] Nygren, E., Sitaraman, R.K., Sun, J.: The akamai network: a platform for high-performance internet

applications. ACM SIGOPS Operating Systems Review 44(3) (2010)

[35] Akamai Technologies: Akamai netsession. http://www.akamai.com/client

[36] Zhao, M., Aditya, P., Chen, A., Lin, Y., Haeberlen, A., Druschel, P., Maggs, B., Wishon, B., Ponec,

M.: Peer-assisted content distribution in akamai netsession. In: Proceedings of the 2013 confer-

ence on Internet measurement conference, ACM (2013)

[37] Princeton University: Codeen. http://codeen.cs.princeton.edu/

[38] Wang, L., Park, K., Pang, R., Pai, V.S., Peterson, L.L.: Reliability and security in the codeen content

distribution network. In: USENIX Annual Technical Conference, General Track. (2004)

[39] W3C: Webrtc. http://www.webrtc.org/

[40] Johnston, A.B., Burnett, D.C.: WebRTC: APIs and RTCWEB protocols of the HTML5 real-time web.

Digital Codex LLC (2012)

[41] Vogt, C., Werner, M.J., Schmidt, T.C.: Leveraging webrtc for p2p content distribution in web

browsers. In: Network Protocols (ICNP), 2013 21st IEEE International Conference on, IEEE (2013)

[42] Ng, K.F., Ching, M.Y., Liu, Y., Cai, T., Li, L., Chou, W.: A p2p-mcu approach to multi-party video

conference with webrtc. International Journal of Future Computer and Communication 3(5) (2014)

[43] Mozilla Foundation: Firefox hello. https://www.mozilla.org/en-US/firefox/hello/

[44] Chiang, C.Y., Chen, Y.L., Tsai, P.S., Yuan, S.M.: A video conferencing system based on webrtc for

seniors. In: Trustworthy Systems and their Applications (TSA), 2014 International Conference on,

IEEE (2014)

[45] Werner, M.J., Vogt, C., Schmidt, T.C.: Let our browsers socialize: Building user-centric content

communities on webrtc. In: Distributed Computing Systems Workshops (ICDCSW), 2014 IEEE

34th International Conference on, IEEE (2014)

[46] Grigorik, I.: High Performance Browser Networking: What every web developer should know about

networking and web performance. ” O’Reilly Media, Inc.” (2013)

[47] Cisco: How nat works. http://www.cisco.com/c/en/us/support/docs/ip/

network-address-translation-nat/26704-nat-faq-00.html

[48] Tuexen, M., Loreto, S., Jesup, R.: Webrtc data channel establishment protocol. (2015)

55

http://www.akamai.com/client

http://codeen.cs.princeton.edu/

http://www.webrtc.org/

https://www.mozilla.org/en-US/firefox/hello/

http://www.cisco.com/c/en/us/support/docs/ip/network-address-translation-nat/26704-nat-faq-00.html

http://www.cisco.com/c/en/us/support/docs/ip/network-address-translation-nat/26704-nat-faq-00.html

[49] F. Aboukhadijeh: Webtorrent. https://webtorrent.io/

[50] Zhang, L., Zhou, F., Mislove, A., Sundaram, R.: Maygh: Building a cdn from client web browsers.

In: Eurosys’13. (2013)

[51] J. Hiesey, F. Aboukhadijeh, A. Raja: Peercdn. https://peercdn.com/ (2013)

[52] Swarm Labs, LLC: Swarmify. http://swarmify.com/ (2014)

[53] Viblast Ltd.: Viblast. http://viblast.com/ (2015)

[54] M. Bu, E. Zhang: Peerjs. http://peerjs.com/ (2014)

[55] Peer5, Inc.: Peer5. https://peer5.com/ (2015)

[56] Peer5, Inc.: Sharefest. https://www.sharefest.me/ (2015)

[57] Ihm, S., Pai, V.S.: Towards understanding modern web traffic. In: Proceedings of the 2011 ACM

SIGCOMM conference on Internet measurement conference, ACM (2011) 295–312

[58] Pries, R., Magyari, Z., Tran-Gia, P.: An http web traffic model based on the top one million visited

web pages. In: Next Generation Internet (NGI), 2012 8th EURO-NGI Conference on, IEEE (2012)

133–139

[59] Fielding, R.T.: Architectural styles and the design of network-based software architectures. PhD

thesis, University of California, Irvine (2000)

56

https://webtorrent.io/

https://peercdn.com/

http://swarmify.com/

http://viblast.com/

http://peerjs.com/

https://peer5.com/

https://www.sharefest.me/

website ﬁle download acceleration using webrtc · website ﬁle download acceleration using...

Documents