big data in the arts and humanities

Download Big Data in the Arts and Humanities

Post on 15-Jul-2015




6 download

Embed Size (px)


  • Big Data in the Arts and HumanitiesAndrew Prescott, University of Glasgow

    AHRC Theme Leader for Digital Transformations

    Big Data in a Transdisciplinary Perspective 7th Herrenhausen Conference of the Volkswagen Foundation

    25 March 2015

  • Neurone activity in the brain of a zebra fish embryo. Each video sequence is one terabyte in size.

    Ahrens, M. B. & Keller, P. J. Nature Meth. (2013)

  • The high frequency telescopes of the Square Kilometre Array will produce 1 exabyte per day (more than current global internet traffic) in

    first phase. This will eventually rise to many Petabits (1015) per second, more than 10 times the current global internet traffic


    Sound and Video: Shoa Holocaust Survivors testimonials collection is 20 terabytes

    (cf. Sloan Digital Sky Survey 10 terabytes) The BBCs digital assets are estimated at about 52 petabytes of

    dataStructured data: US National Archives and Records Administration: 142 TB of

    data; estimated 347 PB by 2022 Ancestry holds 14 billion records and is adding 2 million records

    daily. Brightsolid's (Findmypast) new data centre in Aberdeen will have 400 petabytes of storage

    Web archives: multi-petabyteLinguistic corpora: Corpus of American Contemporary English: 450 million words Wikipedia Corpus: 1.9 billion words Google American books n-grams: 155 billion words


    The papers of the British prime minister William Ewart Gladstone (1809-1898): approx. 160,000 documents in 762 volumes.

    Margaret Thatcher archive: 1 million documents in 3,000 boxes occupying 300 metres of shelving

    Enron Corporation Corpus, acquired by Federal Energy Regulatory Commission during enquiry into corporations collapse. Approx. 600,000 e-mails generated by 158 employees; about 423MB (zipped).

  • Electronic records from the Executive Office of the President during the second presidency of George W. Bush: 82 TB of data; 200+ million e-

    mail messages; 3+ million digital photographs; 30+ million other electronic records

  • ###### Begin Original ARMS Header ###### RECORD TYPE: PRESIDENTIAL (NOTES MAIL)CREATOR:Sandy Kress ( CN=Sandy Kress/OU=OPD/O=EOP [ OPD ] ) CREATION DATE/TIME:14-JUN-2001 17:13:17.00SUBJECT:: Education statement TO:Claire E. Buchan ( CN=Claire E. Buchan/OU=WHO/O=EOP@EOP [ WHO ] ) READ:UNKNOWN ###### End Original ARMS Header ######

    ---------------------- Forwarded by Sandy Kress/OPD/EOP on 06/14/2001 05:13 PM ---------------------------

    Sarah Pfeifer 06/14/2001 04:59:34 PM Record Type: Record

    To: Sarah E. Youssef/OPD/EOP@EOP, Brian R. Besanceney/OPD/EOP@EOP, Sandy Kress/OPD/EOP@EOP cc:Subject: Education statement

    ---------------------- Forwarded by Sarah Pfeifer/OPD/EOP on 06/14/2001 04:59 PM ---------------------------

    Sarah Pfeifer 06/14/2001 04:59:00 PM Record Type: Record

    To: See the distribution list at the bottom of this message cc: Subject: Education statement

    This statement has been approved by the President. Harriet called me several minutes ago with one last change, which I have incorporated.

    Message Sent To:_____________________________________________________________ Harriet Miers/WHO/EOP@EOP

    John Gardner/WHO/EOP@EOP Barbara A. Barclay/WHO/EOP@EOP Debra D. Bird/WHO/EOP@EOP Carolyn E. Cleveland/WHO/EOP@EOP

    E-mail by B. Alexander (Sandy) Kress, Senior Adviser to President George W. Bush on Education, concerning the drafting of the No Child

    Left Behind Act in 2001

  • Visualisation of relationship between terms in Wikileaks Significant Action Reports real to Iraq

    Big data: whose size forces us to look beyond the tried-and true methods that are prevalent at that time (Jacobs, 2009)

    Illustrate how big data is already a current issue for humanities researchers

    Suggests humanities becoming not only more quantitative, but also more visual, haptic and exploratory

  • collateral exposure..?POSSIBLE INFORMATIONmedia diversion..?POSSIBLE INFORMATION

    Extract from project publication for Insurance.AES256 by Michael Takeo Magruder (2011), using Wikileaks material to reflect on issues of

    information freedom and secrecy in today's ever-shifting media landscape.

  • Portfolio of Big Data projects funded by UK Arts and Humanities Research Council,

    2014-15 Dealing with large textual corpora: UK statute law; mining

    the history of medicine

    Linking existing databases: Snapdrgn; Big Data History of Music

    Annotation of unstructured data: DEEP film access; optical music recognition; Lost Visions

    Visualisation: International crime fiction; Seeing Data

    Critical study of data: Our Data Ourselves; Secret Life of a Weather Datum

  • Portfolio of Big Data projects funded by UK Arts and Humanities Research Council,

    2014-15 Mapping: Literary History of Edinburgh;

    Internet of Things: archaeological 3D imaging; Tangible Memories

    Reflects range of activities currently used in Big Humanities.

    Does anything link these together methodologically? Do they represent anything different from what we have previously done?

    Is there a Big Data moment, or is it simply that data and expertise is now available on a larger scale?

    What distinctive contributions can the arts and humanities make to the Big Data debates?

  • HAVE WE BEEN HERE FOR A LONG TIME? If Big Data is defined as data whose

    size requires us to look beyond tried methods, it has been with us since antiquity

    Invention of writing linked to government need to manage information

    1086: Detailed register of property in Domesday Book

    12th century: development of pipe rolls and use of counters in government accounting

    13th century: alphabetisation of the bible by a team of Dominican friars


    Historical examples like Domesday Book or census were inventories; descriptive and backward-looking

    The aim of Big Data techniques is predictive: We know what you are going to do tomorrow (credit score agency)

    Results derive from quantity of data rather than quality; methods inherently inexact but the vast amount of data compensates for the imperfections (Mayer-Schonberger, p. 187)

    Ignores causal relationships and looks for co-relations e.g. how lifestyle factors predict likelihood of adhering to medical prediction

  • EXAMPLES OF PREDICTIVE ANALYTICS Driven largely by finance and retail, but rapidly spreading into other


    Chicago: Automated Preventive Rodent Baiting Program analyses 31 indicators to predict where rodent infestations will occur

    New York: predicting where unlicensed building conversions have occurred to target inspections and issue vacate orders

    Chicago: Predictive Policing System

    AHRC programme includes projects on online betting on election results, and on legislation

    AHRC-Nesta project to use predictive analytics to improve museum attendance

  • Use of big data techniques in choosing film directors, cast, crew, etc.:

  • Use of predictive analytics to optimise scripts in film and TV:

    John Wiley considering using IBM Pure Data analytics in similar way for scientific and academic publishing


    Not simply about role of quantification or scientific method in arts and humanities

    Challenges assumptions about role of information in research: if data is big enough, messy or poorly curated data need not be an issue

    Questions existing research methods: data-driven research

    Undermines assumptions about causality and human agency

    Role of retail and financial agencies in developing these methods - the enclosure of data

    Challenges existing critical and theoretical frameworks: not end of theory but big data needs big theory


    Developing new theoretical frameworks and responses: critical data studies

    Providing models in areas such as causality and messiness of data

    Exploring the spaces and flow of big data

    Promoting moral values of humanities research in a big data world

    Role of design

    Radical contextualisation of big data

    Humanisation of big data

  • THE NEED FOR BIG THEORY Chris Anderson in Wired 2008: Out with every theory of human

    behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

    New York Times, 2010: The next big idea in language, history and the arts? Data. Members of a new generation of digitally savvy humanists argue it is time to stop looking for inspiration in the next political or philosophical ism and start exploring how technology is changing our understanding of the liberal arts. This latest frontier is about method, they say, using powerful technologies and vast stores of digitised materials that previous humanities scholars did not have.

    Charles Darwin (cited by Callebut): all observation must be for or against some view if it is to be of any service


    Bowker (2006)