dh101 2013/2014 course 9 - crowdsourcing, crowdfunding, wikipedia, open street map, mechanical turk

Post on 06-May-2015

1.897 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Digital Humanities 101 - 2013/2014 - Course 9

Digital Humanities Laboratory

Frederic Kaplan

frederic.kaplan@epfl.ch

Semester 1 : Content of each course

• (1) 19.09 Introduction to the course / Live Tweeting and Collective note

taking

• (2) 25.09 Introduction to Digital Humanities / Wordpress / First assignment

• (3) 2.10 Introduction to the Venice Time Machine project / Zotero

•9.10 No course

• (4) 16.10 Digitization techniques / Deadline first assignment

• (5) 23.10 Datafication / Presentation of projects

• (6) 30.10 Semantic modelling / RDF / Deadline peer-reviewing of first

assignment

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 2o

Semester 1 : Content of each course

• (7) 6.11 Pattern recognition / OCR / Semantic disambiguation

• (8) 13.11 Historical Geographic Information Systems, Procedural modeling /

City Engine / Deadline Project selection

• (9) 20.11 Crowdsourcing / Gamefication / Wikipedia

• (10) 27.11 Cultural heritage interfaces and visualisation / Museographic

experiences

•4.12 Group work on the projects

•11.12 Oral exam / Presentation of projects / Deadline Project blog

•18.12 Oral exam / Presentation of projects

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 3o

Today’s course

•Objective of the course : Answering two questions : Why do projects rely on

crowdsourcing ? Why do people participate in crowdsourced projects ?

•Why do projects rely on crowdsourcing ?

•Case study : Transcribing handwritten texts using mechanical turk

•Case study : Crowdfunding a scientific project

•Why do people participate in crowdsourced projects ?

•Case study : Climbing the Wikipedia pyramid

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 4o

Crowdsourcing

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 5o

FromWikipedia

•”Crowdsourcing is the practice of

obtaining needed services, ideas, or

content by soliciting contributions from

a large group of people, and especially

from an online community, rather than

from traditional employees or suppliers”

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 6o

FromWikipedia

• ’The term was coined in 2006 by Jeff

Howe in a Wired article, The Rise of

Crowdsourcing. http://www.wired.

com/wired/archive/14.06/crowds.

html?pg=1&topic=crowds

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 7o

Why do projects rely on crowdsourcing ? Why do peopleparticipate in crowdsourced projects ?

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 8o

Why do projects rely on crowdsourcing ?

•Because its free or cheap (cf. Amazon’s Mechanical Turk)

•Because it permits to have a better engagement of users (or leaners in the

case of peer-grading)•Because it permits to harness the wisdom of the crowds• cf. Claire Ross, Social media for digital humanities and community engagement, in Warwick,

Terras, Nyhan, Digital Humanities in Practice, Facet Publishing, 2012.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 9o

The wisdom of the crowds

•Surowiecki’s four criterias (2004)

•Diversity : Each participant has different

background and perspectives

• Independence : Each participant makes

their own decision

•Decentralization : Descision are local, no

central planner

•Aggregation : A way to turn individual

judgements into collective decisions.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 10o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 11o

A case study : crowdsourced transcription

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 12o

UCL Transcribe Betham

•60 000 manuscripts of Jeremy Bentham

(1748-1832)

•20 000 already transcribed using

traditoinal approach, 40 000 to go

•TEI Encoding. Use MediaWiki

•5 000 manuscripts transcribed (06-2013)

•33 000 volunteers but a very limited

number of very productive and dedicated

users

•Crowdsifting instead of crowdsourcing

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 13o

ReCaptcha : A free anti-bot service

•From http://www.google.com/

recaptcha/learnmore

•200+ million CAPTCHAs are solved by

humans around the world every day.

•10 s / CAPTCHA

•150 000 hours of work each day

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 14o

ReCaptcha : A free anti-bot service

• reCAPTCHA improves the process of

digitizing books by sending words that

cannot be read by computers to the

Web in the form of CAPTCHAs for

humans to decipher.

•But if a computer can’t read such a

CAPTCHA, how does the system know

the correct answer to the puzzle ?

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 15o

ReCaptcha : A free anti-bot service

•Each new word that cannot be read

correctly by OCR is given to a user in

conjunction with another word for which

the answer is already known.

• If they solve the one for which the

answer is known, the system assumes

their answer is correct for the new one.

The system then gives the new image to

a number of other people to determine,

with higher confidence, whether the

original answer was correct.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 16o

Canwe useMechanical Turk to do this ?

•Who knows where the name Mechanical Turk comes from ?

•Mechanical Turk permits to perform Human Intelligence Tasks (HITs)

•A requester is presented with many different templates from which to choose

in the design of a HIT which include a writing, survey, translation,

categorization, and other templates.

•500 000 workers from over 190 countries in January 2011.

•Payments are done with Amazon Payments. Requesters pay 10 % of the price

of successfully completed HITs to Amazon

•The average wage is about one dollar an hour (each task averaging a few

cents). Some have criticized Mechanical Turk as a digital sweatshop. We will

discuss this more at the end of this lecture.

•Problem for us : You need an american address to use Mechanical Turk.my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 17o

Crowdflower : ameta-engine for crowdsourcing

•Crowdflower plays the role of meta-engine or interface to several

crowdsourcing services.

•CrowdFlower has over 50 labor channel partners, among them Amazon

Mechanical Turk

•1 billion tasks (small units of work) since it began operation, and presently

does 5 man-years of work daily (Source : Wikipedia 19/11/2013)

•So let’s try it.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 18o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 19o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 20o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 21o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 22o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 23o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 24o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 25o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 26o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 27o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 28o

Combining crowdsourcing and grammatical rules

•Raw crowdsourced words transcriptions are likely to contain many errors

•But we also have a good grammatical model of this venetian dialect (Thanks

to the work of Lorenzo Tomasin) and a lot of venetian transcriptions.

•Many errors could be automatically corrected using these bits of information.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 29o

Survey : Do you want to use crowdsourcing in your nextsemester’s project ? Should the DHLAB sponsor this ?Answer on Framapad

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 30o

What about crowdfunding your research project ?

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 31o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 32o

Crowdfunding in general

•Kickstarter : 5.2 million people have pledged 882 million, funding 52 000

projects.

•Kiva : 600 000+ lenders have channelled almost 275 million to entrepreneurs

in the developing world.

•Obama’s 2008 election campaign : 780 million, much of it from small online

donations.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 33o

Example of a scientific Kickstarter projecthttp://www.kickstarter.com/projects/1616707907/virtual-prehistoric-worlds

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 34o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 35o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 36o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 37o

Crowdfunding sites

• Indiegogo : http://www.indiegogo.com/

•France : http://www.ulule.com/

•Switzerland : http://wemakeit.ch/

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 38o

After the pause, we will talk about Wikipedia andGamification. In the meantime you can try Wikiracehttp://wikirace.christopherdebeer.com/

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 39o

Why do people participate in crowdsourcing projects ?

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 40o

OpenStreetMap

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 41o

Haiti’s OSM before and after the earthquake (800+ changes)

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 42o

Muchmore precise than Googlemaps

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 43o

Because the data is open, new layers can be added

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 44o

OSM mappers seem intrinsically motivated for buildingcontent together

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 45o

Is is the same for Wikipedian users ?

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 46o

Wikipedia demonstration

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 47o

IsWikipedia a good resource ?

•Some academics argue that the use of Wikipedia is not appropriate forscholarly settings, because it is collectively built by amateurs.•Achterman, D. (2005) Surviving Wikipedia : improving student search habits through information

literacy and teacher collaboration, Knowdelge Quest, 33 (5), 38-40

•Black, E. (2007) Wikipedia and Academic Peer Review : Wikipedia as a recognized medium for

scholarly publications ? Online Information Review, 32 (1), 73-88

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 48o

Wikipedia is in perpetual beta, constantly getting better

•Wikipedia is be updated and improved at a much faster rythm that other

scholarly edited encyclopedias.

• It improves all the time.•Several recent studies have shown that Wikipedia can equal or outperform

other traditionally edited encyclopedias in terms of accuracy.•Giles, J. (2005), Internet Encyclopedia go Head to Head, Nature, 438, 900-1

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 49o

Wikipedia creates a diplomatic zone

•Wikipedia manages to create a diplomatic zone, where conflicts betweendifferent perspectives can be solved in search of a common neutral consensus.This is a definitive advantage compared to other static (online or printed)encyclopedias.•For diplomacy in general, see Bruno Latour, Enquetes sur les modes d’existence : Une

anthropologie des modernes, La Decouverte, 2012.

•Bryant, S. et al (2005) Becoming Wikpedian, In Group 05, 1-10, ACM Press

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 50o

Wikipedia is felt as common good

• It is backed-up by many users all over the world

•Therefore, it is one of the rare digital resources that is bound to last.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 51o

Why Wikipedia works ?

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 52o

Hypothesis : Wikipedia is a game

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 53o

Foursquare is game and amapping service

• In recent years, we have seen several

examples of successful creation of

collective knowledge bases using

addictive games.

•This is a particular case of Gamification

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 54o

Twitter is a game

•One could argue that services for

sharing/constructing collective

knowledge online are also games (even if

they are not presented as such).

•The success of Twitter is linked with its

smooth Onboarding process

•We discussed this case on course 1.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 55o

Quora is a game

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 56o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 57o

Quora’s strategy

•Quora must attract qualified contributors to write high quality answers to

questions.

•Can you imagine some strategy to reach this goal ?

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 58o

To reach this goal, Quora chose a very clear strategy :personalize the answers, anonymize the questions

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 59o

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 60o

Quora’s strategy

•Questions are not owned by the person who asks them.

•They are immediatly treated as a common goods, that can be updated and

modified by anyone.

•On the contrary, the interface associates strongly the user and his answers.

•The systematic juxtaposition between the id of the user (incl. pictures, name

and short bio) and his answers introduces an equivalence between the value of

an user and the value of his answers.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 61o

Quora’s strategy

• In addition, Quora introduces an explicit ranking system : the best rated

answers are shown first.

•Each question is thus a competition between Quora’s users.

•The one who provides the best answer wins the game.

•Like in Twitter, the user understands Quora’s implicit rules as he plays and

learns what he must do to play well in this particular kind of games.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 62o

What kind of game is Wikipedia ?

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 63o

Wikipedia is MMORPG (Massively Multiplayer Online RolePlaying Games)

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 64o

Wikipedia

•Onboarding : No need to be identified to start contributing. But this is

necessary to climb the tiers.

•Registering is like reaching level 1

•By registering, the user gets a few new powers. He can have his own webpage.

He can vote.

•These are first steps to motivate him to progressively discover and climb the

levels of the big pyramid associated with each version of Wikipedia.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 65o

Wikipedia

•How can one climb the tiers ? What kind of privilege have the more powerful

users ? The new contributor does not know it yet.

• If he persists he will discover that he can exercice different jobs in the

Wikipedia world.

•Administrators, Bureaucrats, Stewards, Mediators, Judge, Bot creator,

Importator, Oversighter, IP Checkers.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 66o

Administrators

•Administrators are responsible for cleaning particular pages, checking

copyright issues, repair vandalism acts.•All this tasks can be done by a normal user, but an administrator has access

to special powers• erase non relevant pages

• protect some pages against change

• block certain users

• rename pages

•mask the history of particular pages.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 67o

Administrators

•How does one become an administrator ? He needs to be elected.•The following criteria are recommended :• a very good understanding of the wiki syntax, rules and global functioning of the local version of

Wikipedia.

• participation to maintenance works

• around 3000 participations

• at least one year of significant activity

•The election is set on a given day and the candidate must obtain a clear

majority (this notion is not absolutely well defined in the French version of

Wikipedia)

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 68o

IP Checker

•An IP Checker has access to the check-user function that permits to make

explicit the connection between an user IP and his account. To become an IP

Checker, one must be approved by the arbitration committee.

•Only 5 persons have this privilege on the French version of Wikipedia.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 69o

Oversighter

•Oversighter can mask a username from all the public records

•mask a comment

•mask a version of a page

• suppress a page and mask it even to administrator

• see oversighter’s special records

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 70o

Bots creators

•Among the 30 most active editors on Wikipedia, 2/3 are bots

•Bots perform repetitive tasks and can interact on Wikipedia pages like a real

Wikipedia user (generate article, edit or destroy an article, translate part of an

article, solve homonymy issues, correct vandalism acts)

•Only a bureaucrat or a steward can allow someone to be a bot creator.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 71o

Bureaucrats

•Bureaucrats manage the status of other users (administrators, bots,

bureaucrats).

•Only 8 persons have this privilege on the French version of Wikipedia.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 72o

Stewards are super bureaucrats

•Stewards are appointed by the international comity. They can manage the

status of all the others contributors.

•There are only 3 stewards on the French version of Wikipedia.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 73o

Mediators

•They can intervene during the fights but cannot vote or recommend a

punitive action.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 74o

Judge

•They can impose a punitive action

•The ArbCom (Arbitration Committee) of the English version of Wikipedia has

only 15 members.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 75o

Wikipedia has also its foundational stories

•The Essjay’s controversy : Essjay was an eminent member of the Wikicratia,

cumulating the functions of administrators, bureaucrats, judge and mediators.

He was caught lying on his bio in this Wikipedia personal page and was

banned.

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 76o

World of Warcraft is so boring compared to Wikipedia

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 77o

World ofWarcraft is so boring compared toWikipedia

•Ordinary clercks during the day, Wikipedian during the night.

•On Wikipedia, with time and perseverance each player can have a double life,

masked behind his pseudo. He can earn new powers as hardly obtained as one

of a big magician in role playing heroic fantasy games.

•When I wrote a first blog post on this issue, a French Wikipedia Bureaucrat

pointed to me a relatively well hidden page describing Wikipedia as

MMORPG. http://fr.wikipedia.org/wiki/Wikipedia:MMORPG

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 78o

Open Debate : is crowdsourcing and gamificiationethical ?

my header

Digital Humanities 101 - 2013/2014 - Course 9 | 2013 79o

top related