Download - Growing Data, Changing Journalism
-
8/4/2019 Growing Data, Changing Journalism
1/23
Growing Data, Changing Journalism
An Explorative Inquiry Into The Rise Of Data Journalism
8 July 2011
Eric R. Alberts (3485595)
Coding Culture (200600075)
MA New Media & Digital Culture
Mirko T. Schfer & Nikos Overheul
2010-2011
-
8/4/2019 Growing Data, Changing Journalism
2/23
Eric R. Alberts2
1 Introduction
Hal Varian, chief economist at Google, stated in an interview with McKinsey
Quarterly in 2009 that the next ten years statisticians will have the sexiest jobaround. He motivated this statement by arguing the ability to take data to be
able to understand it, to process it, to extract value from it, to visualize it, to
communicate it [] is going to be a hugely important skill in the next decades
(Manika 2009). When looking at Varians employer it is obvious why he came to
this conclusion. A company that benefits greatly from vast amounts of data is
Google. Every day enormous amounts of data are collected as a by-product of user
interactions with Googles services. From this data new economic value is created.
Google, however, is but one example of how important large data sets have becomeduring the last decade. In an attempt to map the current explosion of data and the
challenges that derive from it, The Economist published a special report in
February of 2010. In this special issue, Joe Hellerstein, a computer scientist at the
University of California, is quoted, naming the current age the industrial
revolution of data (Cukier 2010, 3). The Economist continues by stating that [t]he
effect is being felt everywhere, from business to science, from government to the
arts (ibid.) giving this phenomenon the label of big data.
In this paper an exploratory inquiry is conducted into one specific area in which big
data is also becoming ever more important: journalism. Several interesting news
reports and cases have already emerged, which ensued directly from delving
through large quantities of data. A prime example is the large-scale investigation
into British politicians expenses, invigorating the debate on government
expenditure. It is likely that in the near future similar data-driven news reports
will see the light of day, as large data sets and sophisticated tools, which allow for
journalists to make sense of this data, are becoming more and more dispersed over
the Internet. Access to and the use of databases are no longer proviso to IT-
specialists or investigative journalists conducting expensive research.
According to the European Journalism Centre (EJC), which organized a roundtable
conference on data journalism in Amsterdam last year, [d]eveloping the know-
how to use the available data more efficiently, to understand it, communicate and
generate stories based on it, could be a huge opportunity to breathe new life into
journalism ( Data-driven Journalism 2010, 6). The role of the reporter can even
be expanded or changed to that of sense-maker by digging deep into data, making
journalism more socially relevant (ibid.). This paper critically analyses these
-
8/4/2019 Growing Data, Changing Journalism
3/23
Growing Data, Changing Journalism 3
opportunities but also discusses the challenges that derive from data journalism.
By critically examining matters such as algorithms, data visualisation and
participatory journalism this paper makes an effort to further contextualise this
new phenomenon.
The inquiry into the properties of data journalism will be conducted on two levels.
First, this paper discusses data journalism on a material level, as databases are
becoming the new valuable sources for journalists. The issue of materiality can be
placed within an already active debate on how the materiality of books is
transforming in the digital age. Companies like Google and Amazon are digitizing
books into data objects, bringing on changes to our relationship with books and
databases. Materiality in this paper also relates to the use of software tools and the
choice of specific algorithms to structure these large amounts of data. Also, visualisation of data is becoming a widely accepted and applied manner for
comprehending the vast and abstract data but not without making decisions that
have consequences for the displayed data.
Second, data journalism will be discussed on a social level as news organisations
are actively involving their readers in analysing large data sets. The practice of
crowdsourcing, for example, has the potential to change traditional producer-
consumer relationships within journalism. The inquiry into the properties of data
journalism on a material and social level will be related to various recently
published examples of data journalism. An inquiry into the opportunities and
challenges that derive from data journalism, however, cannot take place before
giving a theoretical framework. This paper will therefore first embed data
journalism in a larger context, describing how the rise of this new phenomenon can
be seen within an already changing world of journalism under the processes of
media convergence and participatory culture.
This paper is intended to offer an insight into a new journalistic practice, which is
drawing more and more attention to it as digital information is rapidly expanding,
becoming widely available and publicly accessible. This paper contextualizes data
journalism and critically engages assumptions and consequences in an attempt to
offer a nuanced overview of this emerging branch in the already changing field of
journalism in the digital age, which, according to Adam Westbrook, author of The
Next Generation Journalist , is one of the big potential growth areas in the future
of journalism (ctd. in Data-driven Journalism 2010, 3).
-
8/4/2019 Growing Data, Changing Journalism
4/23
Eric R. Alberts4
2 Theoretical framework of data journalism
As stated above, before discussing the opportunities and challenges it is of
importance to first get a grip of what data journalism is and how this practicerelates to journalism in general and the trends in todays digital culture. Because
data journalism is only recently taking shape it has not been fully theorised yet.
There is, however, extensive theory to be found on participatory journalism, the
blurring of traditional relationships between news media and consumers of media.
In this chapter, data journalism is understood in terms of participatory journalism
but with a much more prominent technological aspect: its reliance on databases.
Data journalism as a practice blends technology and culture in a way that it is not
feasible to separate the technological aspects from its social context. Before this
argument is further elaborated, it is, for the sake of contextualizing, necessary tofirst give an overview of the technological and cultural processes from which
participatory journalism and, thus, data journalism have emerged.
We live in a world that is witnessing a revolution in information technology, a
converging set of technologies, which is penetrating all domains of human activity
(Castells 2000). This revolution, however, is not just a technological process
amplified through digitization, or what Nicholas Negroponte referred to as the
transformation of atoms into bytes (Negroponte 1995). The digital age is also
becoming a unified environment in which computer hardware and software define
possibilities for action and conditions of expression (Rieder & Schfer 2008, 2).
The new human condition is characterized by what Henry Jenkins refers to as
convergence culture, enabling new forms of participation and collaboration
(Jenkins 2006, 245). According to Jenkins [c]onvergence is both a top-down
corporate driven process and a bottom-up consumer-driven process (Jenkins
2004, 37) and is taking place on a global scale in acts of media production and
consumption. Previously set borders between making media and using media, but
also between media industries, continue to blur.
This idea of convergence, the blurring of boundaries set by the conditions of
digitization (Jenkins 2006, 11), was initially enthusiastically welcomed in the field
of journalism by authors like Dan Gillmor (2004) naming it grassroots or
participatory journalism. The proliferation of networked communication
technologies enables people to launch independent news organisations as a direct
response to what were perceived as shortcomings in mainstream news coverage
(Deuze 2008). A few of these alternative websites produced by amateurs/citizens
are Indymedia, Wikinews and Ohmynews, the latter being an alternative to the
-
8/4/2019 Growing Data, Changing Journalism
5/23
Growing Data, Changing Journalism 5
highly conservative mainstream press in South Korea (Kahney 2003). Studies on
these cases show how citizen media offer interesting bottom-up alternatives to
conventional top-down practices of news making (Paulussen & Ugille 2008, 26).
At first glance these success stories back the more utopian beliefs of Dan Gillmor. If
this trend would continue, independent participatory journalism might be able to
replace top-down news media and its traditional news media-user relationships.
Closer analysis of participatory journalism which is still a rather ill defined term
(Hermida 2008) shows, however, that it is fair to say that the impact of weblogs
and citizen media on traditional, professional journalism has thus far been rather
limited (Paulussen & Ugille 2008, 26). This is partially due to the prevailing
tendency among journalists to see themselves as the defining actors in the process
of making news (Heinonen 2011). The main conclusion of a study on thedevelopment of participatory journalism on a global scale conducted by David
Domingo et al. reveals that professional newsrooms appear to be rather reluctant to
open up the news production process to the active involvement of citizens
(Domingo et al. 2008). The primary question posed by researchers such as Wilson
Lowrey (2005) of whether participatory journalism is in a way substituting
professional journalism is thus losing relevance. Instead, the focal point of research
on participatory journalism has shifted towards how mainstream news media are
adopting citizen contributions in the process of news production (Paulussen &
Ugille 2008, 26).
In 2003 J. D. Lasica, senior editor of the Online Journalism Review, also stated
readers want to be part of the news process (Lasica 2003, 74). But Lasica
supplemented this statement by noting that instead of looking at participatory
journalism and traditional journalism as rivals for readers eyeballs, we should
recognize that were entering an era in which they complement each other,
intersect with each other, play off one another (73). He continued by stating we
are starting to see a mixture of commentary and analysis from grassroots as
ordinary people find their voices and contribute to the media mix. Blogs wont
replace traditional news media, but they will supplement them in important ways
(74). Although Lasica wrote this essay almost ten years ago, we can see now that
traditional news media indeed continue to dominate the news media landscape and
are becoming ever more capable to harvest the potential of an active audience. For
instance, in the Netherlands Dutch news organisation NRC integrates a successful
weblog with the physical newspaper NRC Next and the Dutch public news
organisation NOS invites young people to contribute to news on its website NOS op
3 (formerly known as NOS Headlines).
-
8/4/2019 Growing Data, Changing Journalism
6/23
Eric R. Alberts6
Mark Deuze, a Dutch professor in communication sciences who has shed light on
the professional identity of journalists in the context of convergence culture, makes
a similar observation as J. D. Lasica does. According to Deuze convergence culture-
based participatory journalism is best understood as some kind of co-creative,
commons-based news platform that is produced when a professional media
organisation (top-down) partners with or deliberately taps into the emerging
participatory media culture online (bottom-up) (Deuze 2008, 109). Furthermore,
participatory journalism is very much under construction (ibid.). The
convergence of top-down and bottom-up journalism is a work in progress with
more or less traditional makers and users of news cautiously embracing its
potential which embrace is not without problems both for the producers and
consumers (ibid.). Mark Deuze and J. D. Lasica offer a far more nuanced
consideration of participatory journalism in the context of convergence culture.This contrasts early utopian-like considerations, which were initially all-too-easily
taken for granted (Domingo 2008, 680).
Early studies on participatory journalism have also been criticized because of
underlying technological determinism (Paulussen & Ugille 2008, 28). Changes in
journalism were explained as caused by technological developments influencing
the work of journalists from the outside (Deuze 2008, 110). Pablo Boczkowski,
underscores the limitations of a sole focus on the effects of new technologies by
showing that although technologies do produce effects, they can only be
understood in the dynamics of technology adoption processes (Boczkowski 2004,
208). Technology must be seen in terms of its implementation, and therefore how
it extends and amplifies previous ways of doing things (Deuze 2008, 110). Changes
occurring in the field of journalism are therefore better understood as a mutual
shaping of technological and social developments rather than as the effects of
technological processes (Paulussen & Ugille 2008, 28).
At the beginning of this chapter I stated that data journalism as a practice blends
technology and culture in a way that it is not feasible to separate the technological
aspects from its social context. With the use of the theory on participatory
journalism I would like to argue that data journalism is the gradual outcome of a
converging culture, which introduces a constantly changing mix of features,
contexts, processes and ideas into the work of individual news workers (Deuze
2008, 112). This means that convergence culture in this particular context is not
merely technologically (Negroponte 1995) nor solely socially driven. Data
journalism should rather be seen in line of Paulussen and Ugille, as an outcome of
the mutual shaping of technological and social developments.
-
8/4/2019 Growing Data, Changing Journalism
7/23
Growing Data, Changing Journalism 7
Critical analysis, however, shows that convergence is a slow and problematic
process and that its true effects are rather limited. Independent weblogs have not
replaced news corporations and professional journalists remain to have control
over a news story. In extending Paulussen and Ugilles line of thought I would
therefore argue that convergence regarding participatory and data journalism is
taking place horizontally (between technological and social aspects) rather than
vertically (between top and bottom, between professionals and amateurs). In other
words, if convergence culture is generally seen as the process of blurring borders
then the borders regarding data journalism are blurring between the technological
and social contexts. The process of convergence in the case of data journalism
should be captured by a lens that emphasizes actors agency as much as
technologys capabilities (Boczkowski 2004, 210).
At the end of this chapter I would like to extend this lens metaphor by Boczkowski
somewhat further. I believe that this lens can be viewed in terms of Actor-Network-
Theory (Latour 1999) in which human and non-human actants combine to form
hybrid actors. When applying this view to data journalism we do in fact see that in
general the technologies, the data and the software tools, are responsible for larger
parts of the action chains, rendering actions intrinsically hybrid (Rieder & Schfer
2008, 161). The digital environment of the database together with the software
tools that enable access to and structure this environment define possibilities for
action and conditions of expression (160). According to Rieder and Schfer,
software is responsible for extending [] the role that technology plays in the
everyday practices that make up modern life (161). In other words, through the
lens of Actor-Network-Theory data journalism can be seen as a network that
consists of linkage between technological, social and cultural actors making data
journalism a hybrid practice.
3 Exploring the core properties of data journalism
In the context of data journalism as a hybrid practice, a network of human and
non-human actors, we can now explore some of the core elements of data
journalism through the use of different case studies. This chapter tries to flesh out
the elements of data journalism by largely following the chain of value creation (see
illustration 1 below). This chain consists of four properties: raw data, structuring or
filtering data, visualising data and storytelling. As a closure to this chapter and
supplementary to these four elements, the aspect of participation, to which the
-
8/4/2019 Growing Data, Changing Journalism
8/23
Eric R. Alberts8
theoretic background is given in the prior chapter, will be discussed. It will become
clear that data journalism seems to open up to participatory possibilities in specific
ways.
3.1 Properties of data
It is a truism that the amount of digital data is currently growing faster than
anything else. A 2008 study by marketing research firm International Data
Corporation (IDC) revealed that around 1200 exabytes (1 exabyte is 1 million
terabyte) of digital data was produced that year (Cukier 2010, 5). The majority of
this data consists out of photos, logs, phone calls and other database-to-database
information from which only 5% of the information [] is structured, meaning it
comes in a standard format of words and numbers that can be read by computers
(ibid.). Data and information are epistemologically different, as information ismade up of a collection of data but data and information are increasingly difficult
to tell apart. Raw data is interwoven with todays algorithms and powerful
computers, which can reveal new insights that would previously have remained
hidden (Cukier 2010, 3-4).
Gannett, the holding behind newspapers USA Today and The Indianapolis Star,
has been leader in the area of database applications. Gannett realised early on that
data should be a driving force in online journalism, for a number of reasons. First,
data is evergreen content so its value to users does not end after twenty-four
hours. Second, because of its sheer size, data can be best delivered in a medium
without space constraints. The data is much more valuable if it is accessible and
searchable at the users convenience. Third, Gannett realised that data is much
more applicable to interactive media than, say, in print form. Data is suited for
research and interaction, not so much for passive activities like reading or viewing
(Gordon 2007). Supplementary to Gannetts list of data properties, which are
relevant to journalism, data in general is transmitted and shared in the form of
text, sound, or images without tangible loss. Because of its freedom from physical
constraints data is however easy to manipulate. With a simple click large amounts
of (personal) information can be copied or permanently deleted (Rieder & Schfer
2008, 163).
3.2 Structuring data
In 2002 Wiebke Loosen, assistant lecturer at the Institute of Journalism and
Communications at the University of Hamburg, concluded that the abundance of
information on the Internet, in terms of its storage, management, multiple use and
unlimited possibilities, are challenging journalism regarding its own processes of
-
8/4/2019 Growing Data, Changing Journalism
9/23
Growing Data, Changing Journalism 9
rationalizing information (Loosen 2002, 5). In structuring the vast amounts of
data lies the biggest challenge for businesses, governments and journalists alike.
When structured, data is a potential goldmine. Google is probably one of the most
obvious examples of a company that knows how to generate economic value from
large amounts of data. This is largely the reason why companies like Google and
Amazon choose to also transform physical objects into data objects. Books, for
instance, are being scaled so that various statistical properties can be analysed for
other purposes. Bernhard Rieder calls this computational potential or the value
of the data of millions of scanned books. According to Rieder the book in the age
of the database adds a contemporary wave of new embedded practices and logistics
of what do we read and how we read it (Yudin 2011).
In Rieders view three new practices emerge when books are translated into dataobjects. First, the whole text can be statistically projected that allow various
explorations of the catalogues content. Second, books can be connected with other
books through data, and books can also be connected to other data like the Internet
or Google Scholar. Third, user gestures and practices, such as tagging, clicking,
number of reads, sales and reviews, can be captured through the use of digital
books. In the latter case, user data can be used to create navigational experiences
and opportunities leading to the personalization of reading. In other words, Google
and Amazon, with their systems to digitize books, transform books into
information, and then unbind and rebind it again as an interactive, social and
semantic interface (Yudin 2011).
The transformation of physical books into data objects by Google and Amazon
paves the road to structure information and generate new value from it. In the field
of journalism a similar underlying motive can be found when we look at the recent
investigation by many news organisations into the emails of former governor of
Alaska, Sarah Palin i. The state of Alaska released the emails following a two-and-a-
half year freedom of information process. The emails date from her inauguration as
governor in 2006 through to 2008 and were released in printed form to the news
organisations. The emails had to be digitized in order to successfully structure
them. This is also the case with the documents of British politicians expenses ii. On
The Guardians specially made homepage it says they have 458.832 pages of
documents in their possession and 234.877 pages are yet to be analysed. All of the
nearly 460 thousand pages of receipts and claim forms were uploaded onto The
Guardians servers as images, which then could be structured in the form of
tagging. Yet tagging alone is inadequate to distillate a news story out of data.
-
8/4/2019 Growing Data, Changing Journalism
10/23
Eric R. Alberts10
3.3 Visualising data
Mirko Lorenz, EJC-member and project leader of the Data Driven Journalism
initiative (DDJ), states that raw data needs to be transformed into something
meaningful. As a result the value to the public grows, especially when complex facts
are boiled down into a clear story that people can easily understand and
remember ( Data-driven Journalism 2010, 12). Illustration 1 shows besides
structuring or filtering the raw data, visualisations play an important role in
generating value as well.
Illustration 1: Data-driven journalism as a process.
According to mash-up artist Tony Hirst an important thing to remember about
data is that it can be used to tell stories, and that it may hide a great many patterns.
Some of these patterns are self-evident if the data is visualised appropriately
(Townend 2009). For data journalism visualisation is an important shackle in thechain of value creation.
An example of how raw data can be visualised and contribute to journalism is The
New York Times visualisation of President Obamas 2011 budget proposal and how
it is spent iii. The interactive squares on their website immediately show how the
Obama administration has planned to spend their budget and how each part of the
budget relates to other parts. Another example is a visualisation by David
McCandless for The Guardian, depicting the emergency budget proposal by British
Chancellor George Osborne iv . McCandless also did a visualisation of the data
-
8/4/2019 Growing Data, Changing Journalism
11/23
Growing Data, Changing Journalism 11
gathered from opinion polls during the general elections of 2010 in Britain v .
Another visualisation that has received a lot of attention is the so-called Homicide
Map by The LA Times, showing Los Angeles County homicide victims vi. The Google
Maps mash-up shows groups of homicides based on the number of homicides in an
area. When clicked on a specific homicide the reader is automatically referred to
the article in the LA Times reporting on the murder. Of course also the British MP
expenses are being visualised by The Guardian, offering a clear-cut overview of
what the newspaper has found so far. It is likely that The Guardian will do the same
when more data about the Sarah Palin emails trickles in.
The importance of visualisation within data journalism raises the question what
visualisation exactly is and what risks it brings along. Lev Manovich defines
information visualisation as a mapping between discrete data and a visualrepresentation (Manovich 2010, 2). He does, however, state that this definition
does not cover all aspects of information visualisation such as the distinctions
between static, dynamic (i.e. animated) and interactive visualization [sic] (ibid.).
While these differences are very important, I would like to follow Manovich in his
argument that the core idea of visualisation has not changed when we switched
from pencils to computers (Manovich 2010, 5). So whether the visualisation is
static or interactive, the core idea still evolves around mapping some properties of
the data into a visual representation (ibid.).
With the use of present-day software it is possible to generate visualisations of
much larger data sets than previously possible. As stated above, this does not mean
that at its core, visualisations have changed over the last three hundred years.
Manovich defines two key principles underlying commonplace information
visualisations: reduction and space. Reduction includes the use of graphical
primitives, such as points and lines, to reveal patterns and structures in the data.
The price being paid for this extreme schematization is the loss of %99 of what is
specific about each object to represent only %1 in the hope of revealing patterns
across this %1 of objects characteristics (Manovich 2010, 5-6). The use of spatial
variables, such as position, size and shape, is another core element typical for
information visualisation. These spatial variables have long been preferred over
other symbols such as color, tone and transparency.
Edward R. Tuftes book Visual Explanations (1997) reveals a case that exemplifies
how reduction and spatial preferences in visualisations can be problematic. The
case is about the cholera epidemic in London in 1854 and shows how the choice of
different intervals to display the data gathered by dr. John Snow give very different
-
8/4/2019 Growing Data, Changing Journalism
12/23
Eric R. Alberts12
representations of this data. If Snow would have chosen a different interval or had
not been so aware of the data and as thorough in his logical thinking he might have
never discovered the origin of the epidemic. This case also shows how popular
journalisms choice to aggregate or over-compress data can lead to misleading
graphical representations. In their article How Not To Lie With Visualisations
Bernice Rogowitz and Lloyd Treinish demonstrate how different representation of
a MRI scan of a human head can influence the interpretation of the data. They
argue that [i]n order to accurately represent the structure in the data, it is
important to understand the relationship between data structure and visual
representation (Rogowitz & Treinish 1995, 4). They conclude by stating that
although nowadays non-experts can create meaningful representation of their data
it is still not easy enough because the visual effects are not well understood by the
user (Rogowitz & Treinish 1995, 14).
Lev Manovich, however, emphasises that new visualisation techniques and projects
developed since the middle of the 1990s seem to no longer strictly take data that is
not visual and map it into a visual domain (Manovich 2010, 11). According to
Manovich the development of computers and the progress in their media capacities
has made it possible to visualise data without reduction: While graphical
reduction will continue to be used, this no longer [sic] the only possible method
(23). This new method of visualisation or direct visualisation can be exemplified
by the use of tag clouds. The tag cloud is an example of a reorganisation of data into
a new representation that preserves its original form: text remains text (12). A good
example of a tag cloud used in journalism is the word cloud by John Schwenkler, at
the time a graduate student in philosophy at the University of California, which got
published in The Boston Globe vii. The cloud revealed that the official weblog of
John McCain, the republican candidate for presidency, used the word Obama
more often than any other word. Even more than Obamas own official blog.
With the use of direct visualisation patterns in the data can be highlighted without
having to reduce or spatially arrange the data with the use of abstract graphical
elements. However, in the case of information visualisation, direct visualisation is
still not that common as in scientific, medical and geovisualisation. During the
1990s and 2000s the speed and processing power of personal computers
progressively increased, but still information visualisation remained to depend on
static vector graphics. Only very recently are sophisticated tools allowing for
interactive constructions of direct visualisation appearing. Manovich concludes
that the ability to show artefacts in full detail is crucial to humanities, as it helps
-
8/4/2019 Growing Data, Changing Journalism
13/23
Growing Data, Changing Journalism 13
the researcher to understand meaning and/or cause behind the pattern she may
observe, as well as discover additional patterns (23).
One can say that this ability is crucial to journalism as well. Visualisation is a key
element for revealing patterns in raw or structured data and making it
understandable for a large audience. For journalists and their employers it is, for
the sake of objectively informing their audience, crucial that these visualisations
display the actual facts. As Manovich has shown, however, information
visualisation is not the same as scientific visualisation and it has a long history of
reducing data to graphical primitives and specific spatial preference. Incorrect
visualisations, which give a distorted view of the actual data, could have large-scale
negative consequences. Direct visualisation, as introduced by Lev Manovich, seems
to offer a solution to this problem. Now it is possible to visualise large quantities of data without reduction and the software tools that make these direct visualisations
possible are rapidly dispersed across the Internet. For instance, ManyEyes,
Tableau, Yahoo Pipes, the University of Amsterdam, Open Calais, and of course
Google offer (free) tools for data visualisation, paving the road for objective data
journalism.
3.4 Storytelling with data
A large part of the EJC roundtable conference in Amsterdam focused on how to tell
stories with data. Surprisingly none of the speakers really questioned if data
necessarily needs to tell a story at all. Adrian Holovaty, a pioneer in data-driven
journalism with a background in both journalism and computer programming,
does question that, suggesting newspapers need to make an important shift and
stop the story-centric worldview (Holovaty 2006). Holovaty claims the daily
processes of journalists are, in practical terms, inefficient, wasting too much of the
powerful raw data at the root of the stories. Instead, news should be orientated
toward computers thereby hoping journalists and data will meet in the middle
(Kiss 2008). If so, structured data remains structured and no longer has to be
deconstructed for the purpose of writing a traditional news story. From his
experience as a journalist Holovaty knows that newspaper organisations
traditionally already collect lots of information, which is relentlessly structured. It
just takes somebody to start storing it in a structured format (ibid.).
Holovatys argument is best understood through the use of examples. For instance,
Faces of the Fallen is a public and searchable database of all the U.S. service
members who died in Operation Iraqi Freedom and Operation Enduring
Freedom viii
. Reporters at The Washington Post already were keeping a detailed
-
8/4/2019 Growing Data, Changing Journalism
14/23
Eric R. Alberts14
database of the deceased service members but this data was most of the time sitting
around unused. In two weeks time Holovaty and his co-workers built the data into
a powerful tool for the public and it was a catalyst for further reporting and used by
activist groups to protest against the war (Kiss 2008). Holovaty also created a
public and searchable database named Everyblock which made it possible to find
crimes committed in the city of Chicago ix. The data comes from CLEARMap, the
crime mapping website of the Chicago Police Department and includes information
on where and when each crime occurred, thereby again using available but unused
data.
Although Holovatys manifesto for computer orientated journalism has inspired
many, including the founder of Pulitzer Prize winning website Politifact x which
compares political statements with actual facts (Waite 2007), there are examples where news organisations use data more in the classical journalistic tradition.
Examples are the news stories based on the Afghanistan War Logs xi, which were
made available by independent organisation WikiLeaks to several news
organisations. Meanwhile the documents have all been structured and are available
through news organisation websites. The New York Times, however, primarily uses
this data to bring regular news stories. Reporters Cynthia OMurchu and Carola
Hoyos of The Financial Times seem to have stayed somewhat closer to Holovatys
view, as they produced several interactive graphics, including an interactive chart
on oil and gas chief executives and their salaries xii. In turn, the graphics serve as the
basis for traditional (follow-up) news stories. These examples show that there seem
to be different views among large news organisations when it comes to
implementing and using data. Holovaty-like data journalism is praised and
sometimes pursued, but also often questioned. Given the fact that Holovaty-like
examples are quite scarce it is fair to say data mainly stands in service of the news
story.
3.5 Participatory aspects of data journalism
Chapter 2 elaborated on the potential change of traditional news media-user
relationships under the process of convergence, the blurring of boundaries set by
the conditions of digitization. Whether this potential is called citizen, grassroots or
participatory journalism, it all boils down to the emergence of bottom-up initiatives
as counterweight to the large top-down news organisations. Comparison between
early writings on participatory journalism and its current status reveals that
participatory journalism should not be regarded as replacement of top-down news
organisations but rather as collaboration between these organisations and their
audiences. News organisations have learned and continue to learn to optimally
-
8/4/2019 Growing Data, Changing Journalism
15/23
Growing Data, Changing Journalism 15
utilise new media affordances and to tap into the desire of readers to be part of the
news making process. Data journalism can be considered as an outcome of this
utilisation and as a testing ground for further collaboration between news
organisations and consumers. In the specific case of data journalism citizens are
not replacing journalists but they are adding to the chain of value creation as they
canalise raw material, such as documents, videos or photos and help journalists
tackle the problem of structuring the vast amounts of data.
Crowdsourcing, a term coined by Jeff Howe in an article for Wired, is something
with which news organisation are increasingly experimenting and can best be
described as tapping into the latent talent of the crowd (Howe 2006) or using
the crowds as an investigative ancillary force (Howe 2009, xxiv). For instance, in
April of 2009 The New York Times release a press release in which it invited theirreaders to comb through the full schedules of Timothy F. Geithner when he was
president of the Federal Reserve Bank of New York xiii. Also, in February of that year
The Huffington Post called upon its readers to help dig through the U.S. Senate
stimulus bill xiv . Other prime examples are, again, the British politicians expenses
scandal and most recently the investigation of the Sarah Palin emails. In all of these
examples news organisations use the combined analytical strength of their
audience with the aim to generate stories out of large data sets. News organisations
use their audience for investigative work, to swift through piles of documentation.
The journalists role in this process is to collate and analyse the findings, making
the journalist the central point of direction.
As Alfred Hermida points out there are also examples of crowdsourcing without
central direction (Hermida 2010). One of these examples is the Kenian open source
platform Ushahidi xv , which was founded in 2008 by a group of bloggers who
wanted to give a response to the wave of ethnic violence sweeping the country in
the wake of elections (Buntling 2011). Ushahidis next project, Huduma (Swahili
for service), will use crowdsourcing in Kenya to monitor the effectiveness of
services such as health and education (ibid.). Hermida also refers to social network
Twitter allowing crowdsourcing to happen on a distributed, asynchronous
manner, with individuals acting independently yet collectively at the same time
(Hermida 2010). An example where the mass collaboration of total strangers on
the web (ibid.) worked was when multinational Trafigura legally banned The
Guardian from reporting on the alleged dumping of toxic waste off the shores of
Ivory Coast. Trafigura became a trending topic on Twitter as the topic was widely
discussed and in less than 24 hours Tragifura backed down (ibid.).
-
8/4/2019 Growing Data, Changing Journalism
16/23
Eric R. Alberts16
The latter example shows how crowdsourcing can be beneficial for journalism in
other ways than for investigative work but does not necessarily apply to data
journalism specifically. When looking at the given data-driven examples, audiences
are primarily used to contribute to information structuring. The Guardian does
however also outsource visualisation tasks. In a specially made group on photo
community Flickr, users can post graphical translations of large data sets, which
can be downloaded from The Guardians Datastore xvi. It remains the question,
though, if crowdsourcing data and data visualisations means data journalism is
intrinsically participatory. News organisations are increasingly implementing so-
called data desks to the work floor as extension to the editorial office. Eric Ulken, a
former reporter of The LA Times, published an article in which he describes the
process of assembling the data desk. According to Ulken the data desk can be seen
as a cross-functional team of journalists responsible for collecting, analysing andpresenting data online and in print (Ulken 2008). Furthermore, the report of the
EJC roundtable conference shows data journalism can profit greatly from applying
the know-how of graphic designers and IT-specialists ( Data-driven Journalism
2010). Adding multiple disciplines to the data desk may imply that participation of
the public in the process of creating news stories is just as likely to stagnate.
4 Conclusion
Questioning if data journalism in intrinsically participatory is one of many
questions still open for debate concerning a new form of journalism, which is
slowly taking form under continuously changing conditions set in a world that is
increasingly relying on technology and digital information. This paper has tried to
give context to this new phenomenon and has explored its core properties using a
variety of examples. At this point, however, it is too difficult to tell what the
implications of data journalism will be. The assumption that a website such as
Politifact, which checks U.S. politicians if their statements are based on facts, will
increase the publics trust in, say, journalism, politics or democracy, is yet to be
proven. For instance, the University of Michigan found in a series of research that
misinformed people, who were exposed to corrected facts in news stories, rarely
changed their minds. Political partisans particularly became even more strongly set
in their beliefs. Facts can make misinformation even stronger (Keohane 2010).
Instead of focussing on possible implications, this paper has tried to place data
journalism within a broader context and has tried to flesh out its core properties in
order to further comprehend this new phenomenon. The theoretical framework
-
8/4/2019 Growing Data, Changing Journalism
17/23
Growing Data, Changing Journalism 17
tells us that data journalism can be placed against the background of journalism
wherein traditional borders have continued to blur over the last decade. Set by the
conditions of digitization, readers have also become users that are able to add value
in the process of news making. This process is, however, a slow process and unlike
the ideas posed at the beginning of this century, participatory journalism has not
yet been able to crumble the power large news organisations. This does not mean
the voice of the public is not being heard. The dispersion of increasingly
sophisticated and free-to-use software and data sets enables people to contribute to
journalism in a new way. Top-down journalism is in some aspects meeting bottom-
up, grassroots journalism but it remains to be work in progress that often offers
more questions than answers.
Against the backdrop of convergence culture I have argued data journalism can beregarded as the outcome of the mutual shaping of technological and social
developments. Besides vertical top-down-meets-bottom-up, convergence is also
taking place on a horizontal axis, between the technological and social contexts of
journalism. Technological aspects are becoming inseparably intertwined with social
aspects, as reporters are coming to rely on databases as fertile soil for the creation
of news stories. In terms of Actor-Network-Theory these human and non-human
actants combine to form hybrid actors. In general the technologies, the data and
the software tools, are responsible for larger parts of the action chains, rendering
actions intrinsically hybrid. Data journalism can therefore be regarded as a hybrid
practice.
Exploration of the core properties of data journalism amplifies its relation with
data and shows the path journalists have to take in order to distillate a story out of
data. On the one hand structuring and visualising data can be crucial shackles in
getting from raw data to story. Sophisticated software tools make it easier than ever
to structure large quantities of data and to visualise data without reducing crucial
data. On the other hand these shackles are not part of a fixed chain or the only
road to deriving news stories from data. Moreover, journalist and computer
specialist Adrian Holovaty argues that nowadays making sense of complicated data
for an audience alone is just as important as telling a story.
Whatever path journalists will walk, whether it is through visualisations, telling
stories, crowdsourcing or building databases, it almost goes without say that the
future for journalism lies in analysing big data. This is a standpoint shared by Sir
Tim Berners-Lee, founder of the World Wide Web. According to Berners-Lee the
responsibility lies with journalists to hold governments, or any one else,
-
8/4/2019 Growing Data, Changing Journalism
18/23
Eric R. Alberts18
accountable, as information increasingly is made available on the Internet (Arthur
2010). How long it will take before the interdisciplinary data desk, with computer
specialists, graphic designers and journalists working together, becomes a full-
grown and respected part of the editorial office remains to be seen. Whatever the
implications will be, as databases keep on growing, culture and technology keeps
on converging and audiences keep on participating, there will be a role for data
journalism out there, somewhere.
References Arthur, Charles. Analysing Data is the Future for Journalists, Says Tim Berners-
Lee. The Guardian 22 Nov. 2010. 3 Jul. 2011
.
Boczkowski, Pablo J. The Processes of Adopting Multimedia and Interactivity in
Three Online Newsrooms. Journal of Communication 54.2 (2004): 197:213.
Bunting, Madeleine. Crowdsourcing Put to Good Use in Africa. The Guardian 19
May 2011. 3 Jun. 2011 < http://www.guardian.co.uk/global-
development/poverty-matters/2011/may/19/crowdsourcing-good-use-in-
africa >.
Castells, Manuel. The Information Age: Economy, Society and Culture. Malden
MA: Blackwell, 3 volumes, first published in 1996.
Cukier, Kenneth N. Data, Data Everywhere. The Economist Special Report 27
Feb. 2010: 3-18.
Data-driven Journalism: What is there to Learn? Amsterdam: European
Journalism Centre, 2010.
Deuze, Mark. The Professional Identity of Journalists in the Context of
Convergence Culture. Observatorio Journal 7 (2008): 103-117.
Domingo, David. Interactivity in the Daily Routines of Online Newsrooms:
Dealing with an Uncomfortable Myth. Journal of Computer-Mediated
Communication 13.3 (2008): 680-704.
-
8/4/2019 Growing Data, Changing Journalism
19/23
Growing Data, Changing Journalism 19
Domingo, David et al. Participatory Journalism Practices in the Media and
Beyond: An International Comparative Study of Initiatives in Online
Newspapers. Journalism Practice 2.3 (2008): 326-342.
Gillmor, Dan. We the Media: Grassroots Journalism by the People, for the People.
Sebastopo, CA: OReilly Media, 2004.
Gordon, Rich. Data as Journalism, Journalism as Data. Readership Institute 14
Nov. 2007. 3 Jul. 2011 < http://getsmart.readership.org/2007/11/data-as-
journalism-journalism-as-data.html >.
Heinonen, Ari. The Journalists Relationship with Users: New Dimensions to
Conventional Roles. Participatory Journalism: Guarding Open Gates at Online Newspapers, Eds. Jane B. Singer et al. Malden, MA: Wiley-Blackwell,
2011.
Hermida, Alfred. How the MSN is Tackling Participatory Journalism. Reportr.net
24 May 2008. 3 Jul. 2011 < http://www.reportr.net/2008/05/24/how-the-
msm-is-tackling-participatory-journalism/ >.
Hermida, Alfred. The Impact of Crowdsourcing on Journalism. Reportr.net 15
Oct. 2010. 3 Jun. 2011 < http://www.reportr.net/2010/10/15/impact-
crowdsourcing-journalism/ >.
Holovaty, Adrian. A Fundamental Way Newspaper Sites Need to Change.
Holvaty.com 6 Sep. 2006. 3 Jul. 2011
.
Howe, Jeff. The Rise of Crowdsourcing. Wired 14 Jun. 2006. 3 Jul. 2011
.
Howe, Jeff. Crowdsourcing: Why the Power of the Crowd is Driving the Future of
Business. New York: Three Rivers Press, 2008.
Jenkins, Henry. The Cultural Logic of Media Convergence. International Journal
of Cultural Studies 7.1 (2004): 33-43.
Jenkins, Henry. Convergence Culture: Where Old and New Media Collide. New
York: New York UP, 2006.
-
8/4/2019 Growing Data, Changing Journalism
20/23
Eric R. Alberts20
Kahey, Leander. Citizen Reporters Make the News. Wired 17 May 2003. 3 Jul.
2011 < http://www.wired.com/culture/lifestyle/news/2003/05/58856 >.
Keohane, Joe. How Facts Backfire. Boston.com 11 Jul. 2010. 3 Jul. 2011
.
Kiss, Jemima. Future of Journalism: Adrian Holovatys Vision for Data-friendly
Journalists. The Guardian 6 Jun. 2008. 3 Jul. 2011
.
Lasica, J. D. Blogs and Journalism Need Each Other. Nieman Reports 57 (2003):
70-74.
Latour, Bruno. Pandoras Hope: Essays on the Reality of Science Studies.
Cambridge, MA: Harvard UP, 1999.
Loosen, Wiebke. The Second-Level Digital Divide of the Web and Its Impact on
Journalism. First Monday 7.8 5 Aug. 2002. 3 Jul. 2011
.
Lowrey, Wilson and William Anderson. The Journalist Behind the Curtain:
Participatory Functions on the Internet and their Impact on Perceptions of the
Work of Journalism. Journal of Computer-Mediated Communication 10.3
(2005).
Manika, James. Hal Varian on How the Web Challenges Managers. McKinsey
Quarterly January 2009. 3 Jul. 2011
.
Manovich, Lev. What is Visualization? Manovich.net 25 Oct. 2010. 3 Jul. 2011
.
Negroponte, Nicholas P. Being Digital. New York: Vintage Books, 1995.
-
8/4/2019 Growing Data, Changing Journalism
21/23
Growing Data, Changing Journalism 21
Paulussen, Steve and Pieter Ugille. User Generated Content in the Newsroom:
Professional and Organisational Constraints on Participatory Journalism.
Westminister Papers in Communication and Culture 5.2: 2008, 24-41.
Rieder, Bernhard and Mirko Tobias Schfer. Beyond Engineering: Software
Design as Bridge over the Culture/Technology Dichotomy. Philosophy and
Design. Eds. Pieter E. Vermaas et al. Springer, 2008.
Rogowitz, Bernice E. and Lloyd A. Treinish. How Not to Lies with Visualizaton.
IBM Research 1995. 3 Jul. 2011
.
Townsend, Judith. #DataJourn Part 2: Q&A with Data Juggler Tony Hirst. Journalism.co.uk 8 Apr. 2009. 3 Jul. 2011
.
Tufte, Edward R. Visual Explanations: Images and Quantities, Evidence and
Narrative. Cheshire, Conneticut: Graphics Press, 1997.
Ulken, Eric. Building the Data Desk: Lessons from the L.A. Times. The Online
Journalism Review 21 Nov. 2008. 3 Jul. 2011.
.
Waite, Matt. Announcing Politifact. Matt Waite 22 Aug. 2007. 3 Jul. 2011
.
Yudin, Ekaterina. Bernhard Rieder: 81,498 Words: the Book as Data Object. The
Unbound Book 21 May. 2011. 3 Jul. 2011. < http://e-
boekenstad.nl/unbound/index.php/bernhard-rieder-81498-words-the-book-as-
data-object/ >.
Examples of data journalism used in this paper
i The Guardian Crowdsourcing the Sarah Palin emails
.
-
8/4/2019 Growing Data, Changing Journalism
22/23
Eric R. Alberts22
ii The Guardian British MP expenses
.
iiiThe New York Times Obamas budget and how it is spent
.
iv The Guardian - Emergency budget proposal 2010
.
v The Guardian General election opinion polls 2010
.
vi The LA Times The Homicide Report
.
vii The Boston Globe Portrait of the candidate as a pile of words
.
viii The Washington Post Faces of the fallen
.
ix Everyblock Make your block a better place
.
x Politifact Sorting out the truth in politics
.
xi The New York Times The Afghan war logs
.
xii The Financial Times Oil and gas chief executives
.
-
8/4/2019 Growing Data, Changing Journalism
23/23
Growing Data, Changing Journalism 23
xiii The New York Times The schedules of Timothy F. Geithner
.
xiv The Huffington Post The Senate stimulus bill
.
xv Ushahidi - information collection, visualization and interactive mapping .
xvi The Guardian Data store on Flickr
.