![Page 1: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/1.jpg)
From Seed to Harvest:Web Archiving
Program Considerations for
SULNicholas
Taylor@nullhandle
Stanford University LibrariesApril 17, 2013
“Digital” by Flickr user clickclaker under CC BY-NC-ND 2.0
![Page 2: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/2.jpg)
hello, my name is Nicholas…
![Page 3: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/3.jpg)
Library of Congress Web Archiving
Library of Congress: “MINERVA”
![Page 4: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/4.jpg)
Web Archiving Life Cycle Model
“Web Archiving Life Cycle Model” by M. Bragg, K. Hanna, et al. (2013). Reproduced with permission.
![Page 5: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/5.jpg)
Web Archiving Life Cycle Model
Program Elements• Vision and Objectives• Resources and
Workflow• Access / Use / Reuse• Preservation• Risk Management
Workflow Elements• Appraisal and
Selection• Scoping• Data Capture• Storage and
Organization• Quality Assurance and
Analysis
![Page 6: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/6.jpg)
PROGRAM ELEMENTS
Web Archiving
“Element Blocks” by Flickr user Asian Art Museum under CC BY-NC-ND 2.0
![Page 7: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/7.jpg)
Vision and Objectives
![Page 8: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/8.jpg)
web archiving program vision
ePADD Discovery Module
PASIG
![Page 9: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/9.jpg)
SUL mission
“The Stanford University Libraries (SUL) is more than a cluster of libraries; it connects people with information by providing diverse resources and services to the academic community.”
“Stanford University Libraries…develops and implements resources and services…that support research and instruction.”
SUL: “Stanford University Libraries on Vimeo”
SUL: “About The Stanford University Libraries”
SUL: “SULAIR Brief Guide”
![Page 10: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/10.jpg)
DLSS mission
“DLSS is the information technology production arm of the Stanford Libraries; it serves as the digitization, digital preservation and access systems provider for SUL; and it is the research and development unit for new technologies, standards and methodologies related to library systems.”
SUL: “New Images of Rare Books and Digitization Devices”
SUL: “SULAIR Digital Library Systems and Services (DLSS)”
![Page 11: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/11.jpg)
proposed program mission
“The web archiving program will provide capabilities for the acquisition, preservation, and dissemination of resources that are increasingly and, often, exclusively accessible via the web that are necessary to support University research, instruction, and other purposes.”
![Page 12: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/12.jpg)
objectives
• build infrastructure• develop expertise• create research
collections• archive records
and deprecated content
• mirror government documents
“Objective” by Flickr user Pedro J. Ferreira under CC BY-NC-ND 2.0
![Page 13: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/13.jpg)
Resources and Workflow
![Page 14: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/14.jpg)
cost modeling
“dollar butterfly (2)” by Flickr user eikosi under CC BY-SA 2.0
![Page 15: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/15.jpg)
staffing
• service manager• crawl engineer• curators• system
administrators• software engineers• technical services• legal counsel
“Digitizing Mark Adams cartoons” by Flickr user suldpg under CC BY-NC-SA 2.0
![Page 16: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/16.jpg)
infrastructure
“Google Storage Server” by Flickr user Kazuya (Kaz) Yokohama under CC BY-NC-ND 2.0
![Page 17: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/17.jpg)
readily workflow-able
• collection management
• site nomination• permissions
tracking• crawl scheduling• data capture• quality assurance “
Web Curator Tool User Manual Version 1.5.2”
![Page 18: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/18.jpg)
workflow challenges
• test crawling• automated QA• AIP/DIP generation• SDR ingest• indexing• enabling access• tools testing
“Salmon Ladder at Bonneville Dam” by Flickr user Serolynne under CC BY-NC-ND 2.0
![Page 19: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/19.jpg)
Access / Use / Reuse
![Page 20: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/20.jpg)
access policy
• dark archive• data redistribution• embargo• onsite/offsite
replay• takedown requests
“DO NOT DUPLICATE” by Flickr user Sam UL under CC BY-NC-SA 2.0
![Page 21: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/21.jpg)
browse and API: Wayback
Internet Archive: “Wayback Machine”
UK Web Archive: “Wayback Machine”
![Page 22: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/22.jpg)
many Wayback Machines
Wikipedia: “List of Web archiving initiatives”
![Page 24: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/24.jpg)
discovery: SearchWorks
SUL: “SearchWorks”
![Page 25: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/25.jpg)
full-text search: Solr
Archive-It: “Explore All Archives”
![Page 26: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/26.jpg)
Preservation
![Page 27: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/27.jpg)
bit preservation
“Binary” by Flickr user mikecogh under CC BY-SA 2.0
![Page 28: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/28.jpg)
preservation engineering
“Máquina de Rube Goldberg en la base del Alinghi” by Flickr user freshwater2006 under CC BY-NC 2.0
![Page 29: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/29.jpg)
Risk Management
![Page 30: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/30.jpg)
Risk Management
• “appified” web• copyright• ephemeral web• financial
sustainability• fostering use
“Zombie Awareness - Extinguisher” by Flickr user Spiffy0777 under CC BY-NC-SA 2.0
![Page 31: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/31.jpg)
Policy
![Page 32: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/32.jpg)
copyright
• § 108 (library exceptions)
• fair use• notification vs.
permission• opt-out / takedown• robots.txt• third-party sites• exceptions?
“Noria con Copyrights” by Flickr user Alex Novoa under CC BY-NC-ND 2.0
![Page 33: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/33.jpg)
collection development
“leaf-cutter ants” by Flickr user Vilseskogen under CC BY-NC-SA 2.0
![Page 34: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/34.jpg)
WORKFLOW ELEMENTS
Web Archiving
“Workflow” by Flickr user luismi_cavalle under CC BY 2.0
![Page 35: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/35.jpg)
Appraisal and Selection
![Page 36: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/36.jpg)
informing selection
• value• risk• size• extent to which
archived
“Fruit market-Barcelona” by Flickr user Marcel Theisen under CC BY-NC-SA 2.0
![Page 38: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/38.jpg)
Wikipedia Live Monitor
Thomas Steiner: “Wikipedia Live Monitor”
![Page 39: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/39.jpg)
Wikipedia articles
Wikipedia: “List of think tanks in the United States”
![Page 40: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/40.jpg)
UNT Nomination Tool
University of North Texas Libraries: “Nomination Tool”
![Page 41: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/41.jpg)
Scoping
![Page 42: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/42.jpg)
the purpose of scoping
“More god?” by Flickr user one two one three under CC BY-NC-SA 2.0
![Page 43: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/43.jpg)
Data Capture
![Page 44: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/44.jpg)
Heritrix
Internet Archive: “A Quick Guide to Running Your First Crawl Job”
![Page 45: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/45.jpg)
other data capture tools
Dan Chudnov and Laura Wrubel: “social feed manager”
Mat Kelly: “WAIL”
Archive Team: “Wget with WARC output”
![Page 46: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/46.jpg)
the elusive web
“Light Writing - Spider Web” by Flickr user forcefeed:swede under CC BY-ND 2.0
![Page 47: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/47.jpg)
scale
“chutes and ladders” by Flickr user reallyboring under CC BY-NC-SA 2.0
![Page 48: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/48.jpg)
Storage and Organization
![Page 49: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/49.jpg)
packages and their contents
“lots and lots and lots of boxes” by Flickr user Toastwife under CC BY-NC-SA 2.0
![Page 50: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/50.jpg)
Quality Assurance and Analysis
![Page 51: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/51.jpg)
QA before, after, during
“Check” by Flickr user ex.libris under CC BY-NC-ND 2.0
![Page 52: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/52.jpg)
Metadata / Description
![Page 53: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/53.jpg)
Metadata / Description
“Hello! My URL Is...” by Flickr user vasta under CC BY-NC-ND 2.0
![Page 54: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/54.jpg)
BEYOND THE MODEL
Considerations
“My donut” by Flickr user Molemaster under CC BY-NC-SA 2.0
![Page 55: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/55.jpg)
other program requirements
• marketing/outreach• performance
metrics• service level
definitions• service roadmap• training• user
documentation
“Sticky notes” by Flickr user Kris Krug under CC BY-SA 2.0
![Page 56: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/56.jpg)
incorporating existing projects
• plan capacity• normalize data• ingest into SDR• seek permissions• process• catalog• enable access
“Geckos” by Flickr user smashz under CC BY-NC-ND 2.0
![Page 57: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/57.jpg)
community engagement
![Page 58: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/58.jpg)
the web changes
Internet Archive: “Wayback Machine”
![Page 59: From Seed to Harvest: Web Archiving Program Considerations for SUL](https://reader037.vdocuments.net/reader037/viewer/2022110306/554be5f9b4c90556328b4ae0/html5/thumbnails/59.jpg)
Nicholas Taylor
@nullhandle
“Thank You” by Flickr user muffintinmom under CC BY 2.0