Transcript
Page 1: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Client-side Reconstruction of Composite Mementos

Using ServiceWorker

Sawood Alam, Mat Kelly, Michele C. Weigle, and Michael L. NelsonWeb Science and Digital Libraries Research Group

Old Dominion University, Norfolk, VA, 23529

@ibnesayeed@WebSciDL

Supported in part by NSF III 15267001

JCDL 2017, June 19-23, 2017, Toronto, Ontario, Canada

Page 2: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

2008 Memento Seen in 2017

2

● https://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html

?

Page 3: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

2008 Memento Seen in 2012

3

● http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html

Page 4: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

XenLand @ Alpha Centauri

4

Page 5: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Zombies in Archive

5

?

Page 6: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Zombies in Archive

6

<img src="http://xenland.alpha/images/map.png">// Is rewritten on replay to become:<img src="http://archive.example.org/1998/http://xenland.alpha/images/map.png">

// URLs constructed by JavaScript are harder to rewrite on replay, e.g.:var base = 'http://xenland.alpha';var imgdir = '/images/';var img = document.createElement('img');img.src = base + imgdir + 'ruler.png';document.getElementById('ruler').appendChild(img);//=>> http://xenland.alpha/images/ruler.png

Page 7: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Replay URL Resolution & Rewriting

7

Reference type Example Resolution after relocation

Relative path images/logo.png Potentially correct

Absolute path /public/images/logo.png Potentially incorrect

Absolute URL http://example.com/public/images/logo.png Potentially live leakage

http://example.com/public/index.html

...<img src="/public/images/logo.png">...

http://archive.example.org/<datetime>/http://example.com/public/index.html

...<img src="/<datetime>/http://example.com/public/images/logo.png">...

Page 8: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Avoiding Zombies

● Ahead-of-time rendering and JS execution○ http://archive.is/

● Archival replay proxy○ https://github.com/ikreymer/pywb/wiki/Pywb-Proxy-Mode-Usage

● Browser extension○ MementoFox (deprecated)

● JS override○ wombat.js in PyWB

● ServiceWorker

8

Page 9: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

● New web API (still a working draft)● A standalone JavaScript file● Persists in the browser independent of the window● Acts as a proxy● Installed by a web page under its domain at a specific path (called scope)● Intercepts all requests in scope

○ Resources under the scope path (at any depth)○ Secondary resource requests originated from any resource under scope

● Allows modification in request and response● Primarily used in web applications for offline access and notification support● Requires HTTPS● Growing browser support (73.61% as of June 8, 2017)

ServiceWorker

9● http://caniuse.com/#feat=serviceworkers

Page 10: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

reconstructive.js

10● https://github.com/oduwsdl/reconstructive

● A ServiceWorker script written for archival replay● Plug-in for web archives or Memento aggregators● Intercepts all network requests originated from a memento● Reroutes requests to an archive (prevents live leakage & incorrect references)● Optionally rewrites the content to add banner & to fix hyperlinks

Page 11: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Zombies, No More!

11● https://github.com/oduwsdl/ipwb

Page 12: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Rewriting Mementos is Expensive

12

Original capture (without any rewriting)

In our experiment over 500 home pages we observed:

● One-fifth mean data overhead● One-third mean time overhead

15% more data in twice the time

Page 13: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Archival Capture Replay Test Suite (ACRTS)

13

reconstructive.js

● https://ibnesayeed.github.io/acrts/

Page 14: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Reconstruction Winners: PyWB & reconstructive.js

A. OpenWaybackB. PyWBC. Memento

ReconstructD. Memento for

ChromeE. reconstructive.js

14

Page 15: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

Future Work

● Use “Prefer” header for original content (when archives support it)● Add a customizable archival banner● Add click handler for lazy rewriting of hyperlinks● Handle archived ServiceWorkers● Write a 404-combat ServiceWorker script for webmasters

15

● http://ws-dl.blogspot.co.uk/2016/08/2016-08-15-mementos-in-raw-take-two.html

Page 16: Client-side Reconstruction of Composite Mementos Using ServiceWorker

Sawood Alam <@ibnesayeed>

● reconstructive.js => no zombies!● Rerouting instead of rewriting (lazy rewriting)● Mean overhead reduction

○ one-fifth data○ one-third time

● 73.61% (and growing) browser support for ServiceWorker○ http://caniuse.com/#feat=serviceworkers

● reconstructive.js○ https://github.com/oduwsdl/reconstructive

● Archival Capture Replay Test Suite○ https://ibnesayeed.github.io/acrts/

Conclusions

16

● In-depth recap: WADL 2017 Thursday, June 22, 3:45pm (https://fox.cs.vt.edu/wadl2017.html)


Top Related