20130321 putting the world's cultural heritage online with crowdsourcing [rootstech salt lake city]

Download 20130321 Putting the world's cultural heritage online with crowdsourcing [rootstech salt lake city]

If you can't read please download the document

Upload: frederick-zarndt

Post on 08-May-2015

3.148 views

Category:

Technology


0 download

DESCRIPTION

Brief history of crowdsourcing Crowdsourcing at libraries around the world Benefits of crowdsourcing Demographics of library crowdsourcers How to use various crowdsourcing web apps

TRANSCRIPT

  • 1.Putting the worldscultural heritage online with crowdsourcingFrederick Zarndt@cowboyMontana [email protected] Slides @ http://bit.ly/crowdsrootstech2013 CCS / Digital Divide Data / DL ConsultingPhoto held by John Oxley Library, State Library ofQueensland. Original from Courier-mail, Brisbane,Queensland, Australia.

2. Crowds 3. In 2004 James Surowiecki published ... The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and NationsIn it he says ... 4. ... a crowd of persons that are diverse ... 5. ...in d epen dent ... 6. ... anddecentralized ... 7. usually makebetterjudgements ordecisions thansingle persons 8. Country Fair by Grandma Moses. Original painting 1950. 9. crowdsourcingwas coined by Jeff Howe in The rise of crowdsourcing published in Wiredmagazine June 2006. 10. web trends forcrowdsourcingJan-2006 to Jan-2013 11. On the date of publication of Jeff Howes Wiredmagazine article, 1-Jun-2007, Wikipedia did not havean entry (list) of crowdsourcing projects*. On 25-Jan-2010 Wikipedias list of crowdsourcingprojects had 35 entries*. On 17-Mar -2013 Wikipedias list of crowdsourcingprojects had 158 entries+.* From Internet Archives Wayback Machine.+ Wikipedia contributors, "List of crowdsourcing projects," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/List_of_crowdsourcing_projects (accessed March 17, 2013). 12. Crowdsourcing is the practice of obtainingneeded services, ideas, or content by soliciting contributions from a large group ofpeople, and especially from an onlinecommunity, rather than from traditionalemployees or suppliers. ... [It] is different from ordinary outsourcing since it is a task orproblem that is outsourced to an undefined public rather than a specific, named group.Wikipedia contributors, "Crowdsourcing," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Crowdsourcing (accessed March 17, 2013) 13. crowdcollaborationcrowd*crowdsourcing ng di citizen science un dfowcrcrowdcastingcrowdvoting 14. what is Alexa? Alexa collects and analyzes Internet data for purposes of web analytics. Web analytics is themeasurement, collection, analysis and reporting of Internet data for the purposes of understandingand optimizing web usage. Alexa is now a subsidiary of Amazon. Alexa was founded in 1996 by Brewster Kahle (Internet Archive) and Bruce Gilliat. Alexa operations includes archiving of webpages as they are crawled. This database served as thebasis for the creation of the Internet Archive accessible through the Wayback Machine. Alexa continually crawls all publicly-available websites to create a series of snapshots of the web. Alexa gathers information from a variety of sources to provide key statistics about each site on theweb, for example, Traffic Rank, the number of PageViews, and site Speed, Bounce Rate, etc.This information is derived from Alexa toolbar users (~6,000,000 worldwide). 15. definitions A PageView is a request for a file whose type is defined as a page. A Unique Visitor is a uniquely identified client generating requests on the webserver or viewing pages within a defined time period (i.e. day, week or month). AUnique Visitor counts once within the timescale. A Visit is a series of page requests from the same uniquely identified client with atime of no more than 30 minutes between each page request. Bounce Rate is the percentage of visits where the visitor enters and exits at the samepage without visiting any other pages on the site in between. World | Country Rank is a function of the average daily unique visits and the numberof unique pages requested.definitions adapted from Wikipedia http://en.wikipedia.org/wiki/Web_analytics 16. crowdfundingKickstarter (http://www.kickstarter.com/) was 1st launched in Apr 2009. As of 17-Mar-2013its Alexa Internet traffic rank is 751 (global) / 294 (USA).35,000+ projects successfully funded with $500,000,000+ by 3,000,000+ people. 17. crowdvotingreddit (http://www.reddit.com/) was 1st launched in June 2005. As of 17-Mar-2013 its Alexa Internet traffic rank is 124 (global) / 54 (USA). reddit had more than 55,000,000 uniquevisitors from 175 countries who cast more than 17,000,000 votes about which stories areimportant. 18. Amazon Mechanical Turk (https://www.mturk.com) was launched Nov 2005.As of 17-Mar-2013 its Alexa Internet traffic rank is 8,219 (global) / 3,036 (USA). 19. crowdsourcingEach day 200,000,000 recaptchas are solved by humans around the world. 20. Zooniverse (https://www.zooniverse.org) was 1st launched as Galaxy Zoo July 2007. As of 17-Mar-2013 it has 801,682 participants worldwide. Its Alexa traffic rank is271,574 (global) / 127,695 (USA). 21. crowdcollaboration 22. Wikipedia Wikipedia began 2001 Now in 285 languages, 24,640,000 articles 4,210,000 articles in English More than 1,000,000 articles each in German, French, Italian, and Dutch 40 wikipedia languages with more than 100,000 articles 112 wikipedia languages with more than 10,000 articles 488,470,000 unique visitors (Jan 2013) 84,848,000 active (5+ edits) contributors Alexa global traffic rank: #6 in worldwide web trafficStatistics from Wikimedia Report Card http://reportcard.wmflabs.org 23. Family Search Indexing was 1st launched (beta) 2004. As of 17-Mar-2013 Family Searchs (https://familysearch.org/) Alexa Internet traffic rank is 4,480 (global) / 1,208 (USA). 24. Started (beta) 2004 More than 780,000 worldwide registered volunteers from ~25 countries index records relevant to family history Approximately 100,000 active volunteers each month UI in Chinese, English, German, French, Italian, Japanese, Korean, Portuguese, and Russian Blind double-key entry with arbitration / reconciliation More than 1,500,088,741 records indexed (July 2012) Accuracy typically > 99.95%Statistics from private communication with Family Search 5-Jul-2013 25. Project Gutenberg was 1st launched Dec 1971.As of 17-Mar-2013 Project Gutenbergs Alexa Internet traffic rank 5,192 (global) / 2,851 (USA). 26. Started Dec 1971 Worldwide volunteers transcribe or proofread OCRd publicdomain books through Distributed Proofreaders 42,000 free ebooks completed (March 2013) More than 100,000 free ebooks offered by its partners andaffiliates Partner / affiliated projects for Australia, Canada, Europe,Germany, Runeberg (Nordic literature), self-publishedcontemporary authors, Consortia Center in collaboration withthe World eBook Library, ... 27. As of 17-Mar-2013 the National Library of Australias (http://trove.nla.gov.au/) Alexa Internet traffic rank is 14,490 (global) / 330 (Australia). Trove gets ~75% of all National Library web traffic. 28. National Library ofAustralia Online since 2008 7,200,000+ pages Top text corrector 1,250,000 lines (June 2012) 2,450,000+ lines corrected each month (average for 1st 6 months 2012) 68,908,757 lines corrected as of July 2012, up from 42,411,468 lines corrected July 2011. 63,613 total registered users (July 2012) 4,146 active users (June 2012)Statistics from private communication with the National Library of Australia Oct 2012 29. Courtesy of Tim Sherrat, Tinkerer-in-Chief at WraggeLabs Emporium (http://wraggelabs.com/ 30. As of 17-Mar-2013 National Library of Finlands (http://www.nationallibrary.fi/) Alexa Internet global traffic rank is 4,303,901. Its Internet traffic rank for Finland was 199 as of 2-Apr-2012. 31. National Library ofFinland Digitalkoot is a project to improve OCR text in digitizednewspapers -- by playing games! Digitalkoot is a collaboration between the NationalLibrary and Microtask Players correct OCR text by playing Myyrsillassa(Mole Bridge) or Myyrjahdissa (Mole Hunt) National Library has 4,000,000+ digitized pages 109,321 registered players (October 2012) Since February 2011 8,024,530 micro-tasks have beencompleted 32. As of 17-Mar-2013 UC Riversides Alexa Internet traffic rank is 11,782 (global) / 4,120 (USA).CDNC gets ~3.30% of all UC Riverside web traffic. 33. California DigitalNewspaper Collection CDNC began digitizing newspapers in 2005 as part ofthe Library of Congress National Digital NewspapersProgram (NDNP) Newspapers digitized to article-level in addition topage-level as required by NDNP (same as Utah DigitalNewspapers) Since 2009 hosted on Veridian at http://cdnc.ucr.edu Collection size 55,970 issues, 495,175 pages, 5,658,224articles, 498,000,000+ lines (Mar-2013) 34. OCR text correction OCR text correction added August 2011 Corrections are done line by line ~578,000+ lines of text corrected Oct 2012 ~935,398+ lines of text corrected Mar 2013 ~2% of the collection corrected, 98% to go! Top corrector 327,244 lines > 2x 2nd corrector 35. Cambridge Public LibraryHistoric Newspaper Collection Cambridge Historic Newspapers online since Jan 2012. Cambridge Massachusetts Public Library digitized localnewspapers (http://cambridge.dlconsulting.com/) Newspapers digitized to article-level Collection size 6,346 issues, 59,070 pages, 669,406articles (Mar-2013) Collection includes 13,099 obituary cards 36. Why correct text? Heres why ... 37. Raw OCR textNewspaper imageDeaths. llnrieff, Esq. of