netarchivesuite meeting, bnf, 24./25.11.2011 1 curator track web@rchive austria michaela mayr...
TRANSCRIPT
NetarchiveSuite Meeting, BnF, 24./25.11.2011 1
Curator TrackWeb@rchive Austria
Michaela Mayr
Austrian National [email protected]
NetarchiveSuite Meeting, BnF, 24./25.11.2011 2
Selecting Websites with External Partners (1)Selection @ ANL• No team of curators • Selection usually by WA team• Experience: very difficult to involve
subject librarians• Austrian Literature collection
– List of URLs submitted by Literature Archive– No tool used– Monthly crawl– Exchange with external literature WA project
• (IIPC Projects: Nomination tool UNT)
NetarchiveSuite Meeting, BnF, 24./25.11.2011 3
Selecting Websites with External Partners (2)
Ideas for the Future…• Universities, students, researchers• Use of bookmarking tools, e.g.
Diigo, Delicious etc.• Public invitation for nomination via
social media
Crowdsourcing of Selection?
NetarchiveSuite Meeting, BnF, 24./25.11.2011 4
Selecting Websites with External Partners (3)
Questions & Discussion• Why?• What partners are you working
with?• What selection tools are you using?
Social Media?• Special topics?• Suggestions or binding?• Selectors‘ involvement in QA?
NetarchiveSuite Meeting, BnF, 24./25.11.2011 5
Metrics (1)
• Dynamically generated from data warehouse
• Reports:– Storage distribution– Daily use of storage– Storage per harvest definitions (total)– Storage per harvest definitions (daily)– Storage and Objects (per year)– Storage and Objects (monthly)– Storage and Objects (daily)
NetarchiveSuite Meeting, BnF, 24./25.11.2011 6
Metrics (2)
NetarchiveSuite Meeting, BnF, 24./25.11.2011 7
Domain Crawl 2009/2010
• Ca. 900.000 Domains• Physischer Speicher: ca. 6 TB (original ca.
8,5 TB, komprimiert und dedupliziert)• Ca. 386 Mio. Objekte• Erkenntnisse zu .at Webseiten:
– 14% (115.000) sind > 10 MB– 71% (580.000) sind < 1 MB– 10% (90.000) enthalten 0 Objekte– 53% (470.000) enthalten < 10 Objekte
NetarchiveSuite Meeting, BnF, 24./25.11.2011 8
Rich and Social Media (1)
• No special harvest definitions
NetarchiveSuite Meeting, BnF, 24./25.11.2011 9
Rich and Social Media (2)
NetarchiveSuite Meeting, BnF, 24./25.11.2011 10
News and Media Harvesting (1)
NetarchiveSuite Meeting, BnF, 24./25.11.2011 11
News and Media Harvesting (2)
• Started April 2011• 23 websites• Weekly, daily, hourly• QA Tool• 310 GB
NetarchiveSuite Meeting, BnF, 24./25.11.2011 12
Further Information:http://webarchiv.onb.ac.at
Social Media:http://twitter.com/AT_Webarchivehttp://www.facebook.com/ATWebarchivehttp://www.slideshare.net/ATWebarchivehttp://screenr.com/user/AT_Webarchive
Questions?