the fdlp web archive dory bower archive-it partner meeting november 18, 2014

18
The FDLP Web Archive Dory Bower Archive-It Partner Meeting November 18, 2014

Upload: poppy-douglas

Post on 21-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

The FDLP Web Archive

Dory BowerArchive-It Partner Meeting

November 18, 2014

FDLP History and Dissemination of Government Publications • Act of 1813: Congress first authorized legislation to ensure the

provision of one copy of the House and Senate Journals and other Congressional documents to certain universities, historical societies, state libraries, etc.

• The Printing Act of 1895: Formed the basis for Title 44, created centralization of printing, binding and distribution of US Government documents, establishing the role of the FDLP, and transfer of the Office of the Superintendent of Documents to the GPO. The first Monthly Catalog of US Government Publications printed at the GPO this year.

• Title 44, US Code: Mandate for Public Printing and Documents. Chapter 19 deals with the Depository Library Program. Title 44 has seen many changes over the last century.

FDLP History and Dissemination of Government Publications

FDLP History and Dissemination of Government Publications

• GPO Electronic Information Access Enhancement Act of 1993: Establishes a means of enhancing electronic public access to a wide range of Federal electronic information.

• 1996: Launch of Catalog of US Government Publications (CGP), the online counterpart for the Monthly Catalog of US Government Publications. Publications dating from July 1976 – Present.

• 1998: LSCM begins use of PURLS for persistent access to electronic copies of government publications

• 2011: Begin use of Archive-It for automated harvest of government websites

Government “Publications”

Web Archiving OptionsDecision process for FDLP Web archiving• Standard PURL: Individual publications

and less complex web sites, using Teleport software

• Archive-It: Content rich websites• Partnership: Hard to harvest sites,

database sites or real time information

Collection DevelopmentDevelop and build website level collection• Must be within scope of FDLP• Not distributing through print• Government information disseminated through

web and not cataloged• Avoid duplication of effort with other institutions or

already in FDsys• Work with the collection development staff with

their many years of experience to help determine needs

Collection Development• Pilot sites: 3 sites to begin testing workflow• SuDoc Y3 sites: commissions, committees,

independent agencies• Special Collections

– Native American Resources • Nominated sites

Collection DevelopmentNominations

• Document Discovery http://usgpo.wufoo.com/forms/document-discovery/

• AskGPO

http://www.gpo.gov/askgpo/• Team email

[email protected]

Collection DevelopmentThe Decision making process•Sent out to team on email, or discuss in weekly meeting•Much discussion within the FDLP web archiving team which represents many areas of LSCM•Is it within scope of FDLP and other collection development parameters•Decide by which means to archive

Collection DevelopmentMoving forward• Y3s almost complete• Working with Collection Development staff with

their extensive experience to determine needs• Move from smaller to larger sites• Non-standard sites (fatherhood.gov, read.gov)• Special Collections• Regular frequency of crawls• Working with other Federal collecting Institutions

Archive-It Workflow

• Notification to Agency– Webmaster – 48 hours intent to crawl– Full disclosure of what we are doing– Chosen for inclusion into FDLP– Will ignore the robots.txt [however only do so

when necessary]• Begin seed list, test crawls, QA, modifications• Concentrate a lot of time on test crawl

Archive-It Workflow

• Run and QA production crawl• Run patch crawls • Submit lots of questions • Best playback possible• Maximize user experience and account• Make live on Archive-It and submit for metadata

FDLP Web ArchiveCollection size:•3.5 TB, over 24 million documents crawled•56 agencies represented on AIT•65 records on CGP (analytical cataloging)•FDLP Project pagehttp://www.fdlp.gov/377-projects-active/2020-web-archiving

Resources:•10 contributors

AccessTwo locations for Access

– Archive-It • Search for “GPO” or “FDLP”

– Catalog of Government Publications (CGP)• Identifiable through “INTERNET” in SuDoc

number• Expert search of wcat=web archiving retrieves

all• Would like to find better access to whole collection

and eliminate this search

FDLP Web Archivehttps://archive-it.org/home/FDLPwebarchive

Catalog of Government Publications

Questions?

[email protected]