content delivery infrastructure - web.eecs.umich.educdn.pdfcontent delivery infrastructure ......

11
Computer Networks Lecture 9: HTTP Content Delivery Infrastructure Peer-to-peer (p2p): hybrid p2p with a centralized server pure p2p hierarchical p2p end-host (p2p) multicast Content-Distribution Network (CDN) HTTP Overview HTTP Performance HTTP Caching Content Distribution Network A Web Page A web page consists of a base HTML-file which may include references to one or more objects an object can be another HTML file, a JPEG image, a Java applet, an audio file, a flash video, etc. each object is addressable by a URL example URL: http://www.mgoblue.com/images/pic.gif host name path name protocol HTTP Overview HTTP: HyperText Transfer Protocol Web’s application-layer protocol client/server model client: browser that requests, receives, and “displays” Web objects server: sends objects in response to requests HTTP 1.0: RFC 1945 HTTP 1.1: RFC 2068 HTTP/2: RFC 7540 (May 2015) PC running Firefox Server running Apache Web server Mac running Safari

Upload: trinhthuan

Post on 28-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Computer Networks

Lecture9:HTTP

ContentDeliveryInfrastructure

Peer-to-peer(p2p):•  hybridp2pwithacentralizedserver•  purep2p•  hierarchicalp2p•  end-host(p2p)multicastContent-DistributionNetwork(CDN)•  HTTPOverview•  HTTPPerformance•  HTTPCaching•  ContentDistributionNetwork

AWebPageAwebpageconsistsofabaseHTML-filewhichmayincludereferencestooneormoreobjects•  anobjectcanbeanotherHTMLfile,aJPEGimage,aJavaapplet,anaudiofile,aflashvideo,etc.

•  eachobjectisaddressablebyaURL•  exampleURL:http://www.mgoblue.com/images/pic.gif

hostname pathnameprotocol

HTTPOverviewHTTP:HyperTextTransferProtocol• Web’sapplication-layerprotocol•  client/servermodel

•  client:browserthatrequests,receives,and“displays”Webobjects

•  server:sendsobjectsinresponsetorequests•  HTTP1.0:RFC1945

•  HTTP1.1:RFC2068

•  HTTP/2:RFC7540(May2015)

PCrunningFirefox

ServerrunningApacheWebserver

MacrunningSafari

UsesTCP:•  clientinitiatesTCPconnection(createssocket)toserver,port80

•  serveracceptsTCPconnectionfromclient•  HTTPmessages(application-layerprotocolmessages)exchangedbetweenbrowser(HTTPclient)andWebserver(HTTPserver)

•  TCPconnectionclosedHTTPis“stateless”•  servermaintainsnoinformationaboutpastclientrequests

HTTPOverview

Protocolsthatmaintain“state”arecomplex!•  pasthistory(state)mustbemaintained

•  ifserver/clientcrashes,theirviewsof“state”maybeinconsistent,andmustbereconciled

aside

TwotypesofHTTPmessages:request,response

HTTPrequestmessage:•  inASCII(human-readableformat)•  generalformat:

HTTP1.xRequestMessage

GET /somedir/page.html HTTP/1.1 Host: www.someschool.edu User-agent: Mozilla/4.0 Connection: close Accept-language: fr (extra carriage return, line feed)

Carriagereturn,linefeedindicatesendofmessage

example

MethodTypes(HTTP1.1)•  GET,POST,HEAD•  PUT

•  uploadsfileinentitybodytopathspecifiedinURLfield• DELETE

•  deletesfilespecifiedintheURLfield

Uploadingform,inputalternatives:1. POSTmethod:

•  webpagesoftenincludeforminput•  inputisuploadedtoserverinentitybody

2. asparametertoGETURLmethod:•  inputisuploadedinURLfieldofrequestline:www.somesite.com/animalsearch?monkeys&banana

inputparameters

HTTP1.x ResponseMessageExample

HTTP/1.1 200 OK Connection close Date: Thu, 06 Aug 1998 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 1998 …... Content-Length: 6821 Content-Type: text/html data data data data data ...

firstline:statusline(protocolstatuscode,statusphrase)

headerlines

data,e.g.,requestedHTMLfile

blankline

HTTP1.xResponse:StatusLineHTTP-version3-digit-response-codeReason-phrase• 1XX–informational• 2XX–success

•  200 OK:requestsucceeded,requestedobjectlaterinthismessage• 3XX–redirection

•  301 Moved Permanently:requestedobjectmoved,newlocationspecifiedlaterinthismessage(“Location:”inheader)

•  303 Moved Temporarily •  304 Not Modified

• 4XX–clienterror•  400 Bad Request:requestmessagenotunderstoodbyserver•  404 Not Found:requesteddocumentnotfoundonthisserver

• 5XX–servererror•  505 HTTP Version Not Supported

Client-sideStates:Cookies

HTTPis“stateless”•  servermaintainsnoinformationaboutpastclientrequests•  butsometimesitmaybeusefultokeepper-clientstates,forexamplefor:•  authorization

•  shoppingcarts

•  wishlist

•  recommendations

•  usersessionstate(Webe-mail)

StatesoruserID(tolookupserver-sidestates)keptatclientsideusingcookies

Client-sideStates:CookiesFourcomponents:1.  cookieheaderlineintheHTTPresponsemessage2.  cookieheaderlineinHTTPrequestmessage3.  cookiefilekeptonclienthostandmanagedbyclientbrowser4. back-enddatabaseatWebserver

client amazonserver

cookie-specificaction,e.g.,wishlist

cookie-specificaction

usualhttpresponse+Set-cookie:1678

usualhttprequestmsg

usualhttprequestmsgcookie:1678

usualhttpresponsemsg

usualhttprequestmsgcookie:1678

usualhttpresponsemsg

servercreatesID1678foruser

Cookiefile

amazon:1678ebay:8734

Cookiefileebay:8734

Cookiefile

amazon:1678ebay:8734

oneweeklater:

“Abuse”ofCookiesExcellentmarketingopportunitiesandconcernsforprivacy:•  cookiespermitsitestolearnalotaboutyou•  youmayunknowinglysupplypersonalinfotosites• advertisingcompaniestracksyourpreferencesandviewinghistoryacrosssites,examplescenario:•  adcompanycontractedwith(1)asocialnetworkingsite,(2)abookstore,and(3)aclothingstore

•  youviewyourfriend’stravelphotostoHawaiiatthesocialnetworkingsite

•  whenyouvisitthebookstore,atravelbookaboutHawaiiispushedtoyou

•  whenyouvisittheclothingstore,aswimminggoggleispushedtoyou

•  atallthreeplacesatravelagency’sextra-lowprice,expiringin30seconds,Hawaiivacationpackageispushedtoyou

ObjectRequestResponseTimeRTT(round-triptime):timeforasmallpackettotravelfromclienttoserverandbackResponsetime:•  1RTTtoinitiateTCPconnection•  1RTTforHTTPrequestandthefirstfewbytesofHTTPresponsetoreturn

•  filetransmissiontime•  ����� =2RTT+transmittime

timetotransmitfile

initiateTCPconnection

RTT

requestfileRTT

filereceived

time time

HTTP1.0HTTP1.0usesnon-persistentconnections:• atmostoneobjectissentoveraTCPconnection• objecttransmissioncompletiondetectedbyrecv()returning0(connectionclosed)

• whyisthisnotagooddesign?

Client Server

SYN SYN

SYN

SYN

ACK

ACK

ACK

ACK

ACK

DAT

DAT

DAT

DAT

FIN

ACK

0 RTT

1 RTT

2 RTT

3 RTT

4 RTT

Serverreadsfromdisk

FIN

Serverreadsfromdisk

ClientopensTCPconnection

ClientsendsHTTPrequestforHTML

ClientparsesHTMLClientopensTCPconnection

ClientsendsHTTPrequestforimage

Imagebeginstoarrive

HTTP1.1HTTP1.1usespersistentconnections:•  serverleavesconnectionopenaftersendingresponses•  subsequentHTTPmessagesbetweenthesameclient/servertofetchmultipleobjectsaresentoverthesameconnection

Client Server

ACK

ACK

DAT

DAT

ACK

0 RTT

1 RTT

2 RTT

ServerreadsfromdiskClientsendsHTTPrequestforHTML

ClientparsesHTMLClientsendsHTTPrequestforimage

Imagebeginstoarrive

DAT Serverreadsfromdisk

DAT

HowtoMarkEndofMessage?Threeoptions:

Content-Lengthinheader:

HTTP/1.1 200 OK Connection close Date: Thu, 06 Aug 1998 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 1998 …... Content-Length: 6821 Content-Type: text/html

HowtoMarkEndofMessage?Impliedlength,e.g.,304(cachefresh)neverhascontent

Transfer-Encoding:chunked(HTTP1.1)•  afterheaders,eachchunkcomprisescontentlengthinhex,CRLF,thenbody;length0indicatesend-of-chunk

HTTP/1.1 200 OK<CRLF> Transfer-Encoding: chunked<CRLF> <CRLF> 25<CRLF> This is the data in the first chunk<CRLF> 1A<CRLF> and this is the second one<CRLF> 0<CRLF>

PipelinedandParallelConnectionsPersistentwithoutpipelining:• clientissuesnewrequestonlywhenpreviousresponsehasbeenreceived

• oneRTTforeachreferencedobject

Persistentwithpipelining:•  clientsendsrequestsassoonasitencountersareferencedobject

• aslittleasoneRTTforallreferencedobjects• defaultinHTTP1.1

BrowserscanopenparallelTCPconnectionstofetchreferencedobjects(eveninHTTP1.0)

HTTPModelingAssumeWebpageconsistsof:• 1baseHTMLpage(ofsizeLbits)• Mimages(eachalsoofsizeLbits)

Non-persistentHTTP:• M+1TCPconnectionsinseries•  responsetime= (M+1)*2*RTT + (M+1)*L/µ,

µ:pathspeed

PersistentHTTP(withpipelining):• 2RTTstorequestandreceivebaseHTMLfile• 1RTTtorequestandreceiveMimages•  responsetime = 3*RTT + (M+1)*L/µ

HTTPModelingAssumeWebpageconsistsof:• 1baseHTMLpage(ofsizeLbits)• Mimages(eachalsoofsizeLbits)

Non-persistentHTTPwithnparallelconnections•  supposeM/nevenly• 1TCPconnectionforbasefile• M/nparallelconnectionsforimages• n-parallelresponsetime= (M/n + 1)*2*RTT + (M/n+1)*L/µ compare:• non-persistentresponsetime= (M+1)*2*RTT + (M+1)*L/µ • persistentresponsetime = 3*RTT + (M+1)*L/µ

HTTPResponsetime(inseconds)RTT= 100msec,L = 5Kbytes,M = 10,andn = 5

Forlowbandwidth,transmissiontimedominatesoverconnectionandresponsetime⇒performanceofpersistentconnections

comparabletothatofparallelconnections

HTTPResponsetime(inseconds)

ForlargerRTT,TCPestablishmentandslowstartdelaysdominateoverresponsetime⇒persistentconnectionsnowgivesignificantimprovement:

particularlyinhighbandwidth×delaynetworks

RTT= 1 sec,L = 5Kbytes,M = 10,andn = 5

HTTP/2

BasedonGoogle’sSPDY(2009)RFC7540cameoutinMay2015(writtenbythe

twoauthorsofSPDY)

ChromebrowseralreadyhasSPDYbuilt-in

ProblemswithHTTP1.x:•  pipeliningstillsuffersfromhead-of-lineblocking(iffirstitemislarge,theresthastowait)

•  parallelstreamssolvesHoLblocking,butonbandwidth-limitedchannel,toomanystreamsclogupthechannel

HTTP/2Somechangesfrom1.1:• headersnolongerintextformat•  separatecontrolanddataheaders•  streammultiplexingoverasingleTCPconnection:•  eachstreamhasanID,dataistaggedwithstreamID

•  eachstreamcanalsohavedifferentpriority

•  serverpush:don’thavetowaitforclienttoparsepagebeforeinitiatingdownload

• headercompression

Performanceimprovement:upto64%reductioninpageloadtime

[Grigorik]

WebCaches(ProxyServer)Goal:satisfyclientrequestwithoutinvolvingoriginserver• usersetsbrowsertodirectallwebaccessesviacache• browsersendsallHTTPrequeststocache

•  ifobjectisnotcached,cacherequestsobjectfromoriginserver,thenreturnsobjecttoclient

•  elsecachereturnsobject•  cacheactsasbothclientandserver

•  typicallycacheisinstalledbyISP(university,company,residentialISP)

•  mustbetransparent,allowforplug-n-play

client

Proxyserver

client

originserver

WebCachingExample:NoCaching

Parameters:•  averageobjectsize=100,000bits•  avg.#ofrequeststoservers=15/sec•  InternetlatencybetweenarouteronthepublicInternetandanyserver=2secs

Resultingperformance:•  utilizationonLAN=15%•  utilizationonaccesslink=100%�over-utilizedlinkcauseslongqueue(delayofminutes)

•  totaldelay =Internetdelay+accessdelay+LANdelay=2secs+minutes+milliseconds

originservers

publicInternet

institutionalnetwork

10MbpsLAN

1.5Mbpsaccesslink

Possiblesolution•  increaseaccesslinkbandwidthto,say,10Mbps(oftenacostlyupgrade)

Performance:•  utilizationonLAN=15%•  utilizationonaccesslink=15%•  totaldelay=Internetdelay+accessdelay+LANdelay

=2secs+msecs+msecs

originservers

publicInternet

institutionalnetwork

10MbpsLAN

10Mbpsaccesslink

WebCachingExample:NoCachingAnothersolution:installcache•  assumehitrateof0.4

Performance:•  40%requestswillbesatisfiedalmostimmediately

•  60%requestssatisfiedbyoriginserver•  utilizationofaccesslinkreducedto60%,resultinginnegligibledelays(say10msecs)

•  avg.totaldelay =Internetdelay+accessdelay+LANdelay=.6*(2.01)secs+msecs<1.4secs

originservers

publicInternet

institutionalnetwork 10MbpsLAN

1.5Mbpsaccesslink

cache

WebCachingExample:WithCaching

ConditionalGET Goal:don’tsendobjectifcachehasup-to-dateversion

•  cache:specifiesdateofcachedcopyinHTTPrequestIf-modified-since: <date>

•  server:responsecontainsnoobjectifcachedcopyisup-to-date:HTTP/1.0 304 Not Modified

MaybeusedwithorwithoutTTL,TTLhardtoset,dependsonsitecontent

cache server

HTTPrequestmsgIf-modified-since: <date>

HTTPresponseHTTP/1.0 304 Not Modified

objectnotmodified

HTTPrequestmsgIf-modified-since: <date>

HTTPresponseHTTP/1.0 200 OK <data>

objectmodified

Multiplecachesmayformadistributedcache

Insteadofgoingdirectlytooriginserver,acachemayqueryoneormoreothercachesforobjectfirst,e.g.,csecachequeriesececachefirst

Toeliminatefrequentinter-cachequery-reply,eachcachemaypushanindexofitscontentstoothercaches,i.e.,ececachetellscsecachealltheobjectsitisholding

Frequently,this“index”isintheformofaBloomFilter

CooperativeCaching originservers

publicInternet

cse

csecache

ece

ececache

BloomFilterAnefficient,lossywayofdescribingaset,comprising:• abitvectoroflengthw• afamilyofindependenthashfunctions

•  eachmapsanelementofthesettoanintegerin[0, w)Toinsertanelement:•  foreachhashfunction,setthebittheelementhashesto

Tosearchforanelement:•  foreachhashfunction,examinethebittheelementhashesto•  ifanybitisnotset,theelementisdefinitelynotintheset•  ifallthebitsareset,theelementmaybeintheset(potentialforfalsepositive)

insert:

search:

search:

BloomFilterThefalsepositiverateisawell-defined,linearfunctionof:

1. width(w),2.  thenumberofhashfunctions,and3.  thenumberofelementsintheset

• widerfiltersarealwaysmoreaccurate

• optimaltradeoffbetweenfilterstorageandaccuracyiswhenabouthalfofthebitsareset

BloomFiltersalsousefulinmaintainingp2psupernodebackboneanddistributedstorageindatacenternetwork

VariableDelay

browsercache

DNSresolution

TCPopen

1stbyteresponse

Lastbyteresponse

Sourcesofvariabilityofdelay• browsercachehit/miss,needforcacherevalidation

• DNScachehit/miss,multipleDNSservers,errors• TCPhandshake,packetloss,highRTT,serveracceptqueue

• RTT,busyserver,CPUoverhead(e.g.,CGIscript)•  responsesize,receivebuffersize,congestion

LimitationsofWebCachingSignificantfraction(>50%)ofHTTPobjectsarenotcacheable

Whynot?•  dynamicdata:stockprices,scores,webcams•  scripts:resultsbasedonpassedparameters•  useofcookies:resultsmaybebasedonpasseddata•  advertising/analytics:ownerwantstomeasure#hits•  randomstringsincontentensureuniquecounting• HTTPS:encrypteddataisnotcacheable• multimedia:objectlargerthancacheornotallowedtobecachedduetointellectualpropertyrights

Howtoensurescalabilityofwebserverwhencontentisnotcacheable?

ContentDistributionNetworks(CDNs)

Streaminglargefiles(e.g.,video)fromasingleoriginserverinrealtimerequireslargeamountofbandwidthSolution:replicatecontenttohundredsofserversthroughouttheInternet•  placeserversinedge/accessnetwork•  contentpre-downloadedtoservers• whenuserdownloadscontent,directusertotheserverclosesttoit•  placingcontent“closeto”useravoidsnetworkdelayandlossoflongpaths

originserverinN.America

CDNdistributionnode

CDNserverinS.America

CDNserverinEurope

CDNserverinAsia

CDNsvs.ContentOwnersMaintainingyourownnetworkofsuchserversisexpensive(bothCAPEXandOPEX)

CDNprovidersmaintainanetworkofserversandsellcontentreplicationservicetomultiplecontentowners• exampleofcontentowners:ABC,HBO,Netflix• exampleofCDNproviders:Akamai,Limelight

• Akamaihas~25Kserversspreadover~1Kclustersworld-wide

CDNreplicatesowners’contentinCDNservers

Whenownerupdatescontent,CDNupdatesservers

SomelargecontentownersoperatetheirownCDNs:Amazon,Google/YouTube,Netflix(virtual)

SampleDelivery(ExampleOnly)

index.html,logo.gif

shirtad.gif

stadium.mp4,tvlogo.mp4

shirtad.gif

stadium.mp4,tvlogo.mp4

index.html,logo.gif

Whydon’twestoreindex.htmlandshirtad.gifattheCDNalso?

www1.cdi.ex

www2.cdi.ex

www3.cdi.ex

[Frank13]

ContentDistributionNetwork

CDNnodescreateapplication-layeroverlaynetworkLargerCDNsmayhavetheirownWANs,e.g.,Google’sB4,thatinterconnectwiththerestoftheInternetlikeanyotherISP’snetworkCDNdirectsarequesttotheserverclosesttotheclient(how?)

[afterWalrand]

Tier-1Backbones

ISPs

IXPs

AccessAggregators

CDNs:e.g.,Akamai,Amazon,Google

ClientRedirection

Howtodirectclientstoaparticularserver?

Aspartofapplication:HTTPredirect•  pros:application-level,fine-grainedcontrol•  cons:additionalloadandRTTs,hardtocache

Aspartofnaming:DNS•  pros:well-suitedtocaching,reduceRTTs•  cons:reliesonproxiesandestimations,notaccurate

Prosandconsofeach?

DNS-basedRedirection

ClientsaredirectedtotheclosestserveraspartoftheDNSnameresolutionprocess:1.  clientasksitslocalDNSresolvertoresolveCDN’sserver’sname

2.  thelocalDNSresolverisdirectedtoCDN’sauthoritativenameserverbyDNS

3.  CDN’snameservereitherreturnstheaddressofserverclosesttotheDNSresolveroranorderedlistofaddresses,rankedbydistancetolocalDNSresolver

CDNExampleHTTPrequestforhome.ex/index.html containscdi.ex/stadium.mp4

DNSqueryforcdi.ex

HTTPrequestforcdi.ex/stadium.mp4

1

2

5

originserver

client’slocalnameserver

nearbyCDNserver

DNSqueryforcdi.ex

CDN’sauthoritativeDNSserver

34

ServerSelectionHowtochoosewhichservertodirectaclient?•  serverload•  client-serverdistance

•  CDNmaintainsa“map”,estimatingdistancesbetweenaccessISPsandCDNnodes

•  CDN’snameserveruses“map”todetermineserverclosesttothelocalDNSresolver

•  DNSresolverusedasproxyforclient�inaccuratelocation•  CDNdoesn’tknowclient’saddressatnameresolutiontime

•  distancecanbemeasuredusingdifferentmetrics,e.g.,latency,lossrate�onlyestimated

• deliverycost(ISPpricing)