co:op-read-convention marburg - milena dobreva

28
How to Index Biographical Data from Archival Documents Using the Methods of Citizen Science? Milena Dobreva University of Malta

Upload: icarus-international-centre-for-archival-research

Post on 09-Feb-2017

460 views

Category:

Science


2 download

TRANSCRIPT

How to Index Biographical Data from Archival

Documents Using the Methods of

Citizen Science?

Milena Dobreva University of Malta

Grain elevators. Caldwell, Idaho, 1941. Photo by Russell Lee.

Prints and Photographs Division, Library of Congress

Main topics

• What is citizen science? (And what is the connection to crowdsourcing? And archives?)

• Why citizen science is still not very popular in the memory institutions?

• How specific tasks (e.g. indexing biographical data from archival documents) can benefit from citizen science?

• What hybrid models combining automated methods and human contributions are currently emerging?

Citizen science

• Involvement of members of the general public in scholarly projects designed by academics

– Unprofessional researchers

– Voluntary participation (the reward for the volunteers is intrinsic)

• Tasks may vary but currently those most popular involve data collection or data entry.

• Crowdsourcing is one possible method (but it is not necessarily aimed at research tasks!)

Indexing

• Structured data (databases, linked open data; Encoded Archival Context - Corporate bodies, Persons, and Families (EAC-CPF))

• Extracts (rather than full texts)

• Quality control by professionals

• Extensive work on:

–Prosopography (SNAC, PROSOP, etc.)

–Merging lists of persons from different sources/researchers

In a nutshell…

• Change in research – open science

• Big data and vast digital resources

• From ‘standing on the shoulders of giants’ to ‘picking the brains’ of these giants (and not only theirs!)

One historical example In July 1857 the ‘Unregistered Words Committee’ of the Philological Society of London issued a circular asking for volunteers to read particular books and copy out quotations illustrating ‘unregistered’ words. The volume was such that in January 1858, The Society decided that “efforts should be directed toward the compilation of a complete dictionary, and one of unprecedented comprehensiveness.”

In April 1879, the newly-appointed editor James Murray issued a new appeal to the public, asking for volunteers to read specific books in search of quotations to be included in the future dictionary. Within a year there were close to 800 volunteers and over the next three years, 3,500,000 quotation slips were received and processed by the OED team.

Sir James Murray before 1910 in the Scriptorium, Banbury Road

Use of a specific platform

Examples

… beyond Transcribing Bentham

• Types of contribution

• Material offered to the volunteers (primary sources, automated data, auxiliary tools)

• Platforms

• Tasks

• Relationship with the contributors

Letters of 1916

10

The British Library: Georeferencing historical maps

11

Year of the Bay

Children of the Lodz Ghetto

Tasks aligned with participatory models of citizen science

• Source: Bonney et al. (2009)

Some findings from previous research

• The inclusion of the citizen in research studies contributed to a rise in interest in the area. When the data of a research is made public, the citizens are encouraged to interpret and study this data in order to come to their own conclusions. This is one of the most educational features of citizen science.

• Citizen science is a good way to get cheap or free labour, skills and computation power – but not in the memory institions!

• This is a good way for citizens to understand and appreciate research. (They also get to see how their tax money is being utilized).

Project outcomes: intended vs actual

• Data sets: 56

• Data analysis: 46

• Academic publications: 43

• Technical reports: 25

• New discoveries: 31

• New research methods: 17

• New inquiry: 21

• Policy changes: 21

• Community action: 38

• Environmental restoration: 23

• Individual learning: 47

• Data sets: 48

• Data analysis: 40

• Academic publications: 33

• Technical reports: 21

• New discoveries: 22

• New research methods: 11

• New inquiry: 14

• Policy changes: 11

• Community action: 26

• Environmental restoration: 17

• Individual learning: 42

Based on Wiggins, A., K. Crowston. (2012).

Data ownership

• No policy: 11

• Currently developing policy: 4

• Researchers own the data: 15

• Project contributors own the data: 13

• Third party owns the data: 1

• Public owns the data: 23

• Not sure/don’t know: 6

Based on Wiggins, A., K. Crowston. (2012).

Use of technological tools

Lacking tools

20

Citizen science: where? (applied vs pure/basic research)

Quest for fundamental

understanding?

Yes Pure basic research (Bohr)

Use-inspired basic research (Pasteur)

No – Pure applied research (Edison)

No Yes

Considerations of use?

Some challenges • Matching projects and people

• Division of labour and integration of contributions (contribution, collaboration, co-creation)

• Platforms and their interoperability with other tools used within the institution

• Trust in citizens’ contributions

• Motivation and its fluctuations

• What do the citizens gain (in terms of “domain literacy”) – longer engagement beneficial!

• Practical issues: how/what domains are addressed in citizen science projects; issues of quality and quantity of research output; data ownership and data interoperability still not sufficiently addressed

Civic Epistemologies

• Roadmap: www.civic-epistemologies.eu/roadmap/

• Registry of tools: www.civic-epistemologies.eu/registry-of-resources/

• During the CIVIC EPISTEMOLOGIES Final Conference in Berlin (12-13 November 2015), organised in cooperation with the RICHES project, the partners of both projects proposed a set of principles aiming to encourage and support the participation of citizens in digital cultural heritage and humanities research. The Berlin Charter, available online at www.civic-epistemologies.eu/berlincharter is open to

be adopted by cultural and academic institutions, private organisations, artists, professionals, researchers and interested citizens.

Summary

• Tools

• Academics involved (research project)

• Tasks

– Indexing: tasks like “where is Waldo”

• Quality expectations (users of the outcome)

• Relationships with the volunteers

Archives and citizen science: possible scenarios

• Competition (data created/analysed by machines vs by people)

• Facilitation (citizen science seen as method to generate big data)

• Interpretation (using humans to contextualise data applications)

• Complementarity (combining both in various combinations)

• Strategic partnerships

Citizen science and archives… friends or foes?

• Capitalising on the tradition of voluntary work

• Identifying projects (and new types of user involvement)

• Building hybrid infrastructures (tools, citizens, professionals)

?

Prosopographic resources

Citizen contribution

Automated text recognition

References 1. Bonney, R., H. Ballard, R. Jordan, E. McCallie, T. Phillips, J. Shirk, and C.

Wilderman (2009). Public participation in scientific research: defining the field and assessing its potential for informal science education. A CAISE Inquiry Group Report. Center for Advancement of Informal Science Education (CAISE), Washington, D.C., USA.

2. Bonney, R., C. B. Cooper, J. Dickinson, S. Kelling, T. Phillips, K. V. Rosenberg, and J. Shirk (2009) Citizen science: a developing tool for expanding science knowledge and scientific literacy. BioScience 59(11):977–984.

3. European Commission; Green Paper on Citizen Science (2013). Available on: http://www.socientize.eu/sites/default/files/Green%20Paper%20on%20Citizen%20Science%202013.pdf

4. Franzoni, C., H.Sauermann (2014) Crowd Science: The Organization of Scientific Research in Open Collaborative Projects. Research Policy, 2014, Vol. 43 (1), pp. 1-20.

5. Wiggins, A., K. Crowston. (2012). Describing Public Participation in Scientific Research, iConference 2012 Toronto, Ontario, Canada. Available: http://crowston.syr.edu/system/files/iConference2012.pdf