[acm press the 2014 workshop - raleigh, nc, usa (2014.04.01-2014.04.03)] proceedings of the 2014...
TRANSCRIPT
1st
Workshop on Human-Centered Big Data Research April 1-3, 2014, Raleigh, NC, USA
53
Sensemaking in Big Data Environments Chris Argenta1
Applied Research Associates
Jordan Benson SAS Institute
Nathan Bos The Johns Hopkins University Applied Physics Laboratory
Susannah B. F. Paletz Center for Advanced Study of Language
University of Maryland
William Pike Pacific Northwest National Laboratory
Aaron Wilson Palo Alto Research Center
ABSTRACT
We report on the sensemaking breakout group at the Human
Centered Big Data Research (HCBDR-2014) workshop. The
authors are a multi-disciplinary team of invited researchers and
stakeholders who participated in this breakout session. This report
includes an overview of our discussions on the many research
challenges associated with sensemaking within a big data
environment. Specifically, we focused on key topics that fit
squarely in the intersection of the sensemaking and big data
research, as other communities already exist for decision making
and big data technologies independently. As part of this effort, our
group developed and proposed a framework around which this
community can target and structure future research. This
framework is intended to allow the community to systematically
identify areas where innovative research might make large
contributions to sensemaking in a big data environment.
Categories and Subject Descriptors
H.3.3 Information Search and Retrieval
General Terms
Experimentation, Human Factors, Measurement.
Keywords
Big Data, sensemaking, workshop.
1. INTRODUCTION This report outlines the proceedings of the sensemaking breakout
group over three days at the Human-Centered Big Data Research
2014 workshop. Our team was composed of invited speakers and
stakeholders with expertise relevant to the intersection of human
cognitive and data sciences. Perhaps the most notable
characteristic of our team was the diversity of its members and
many domains of expertise represented. As such, our definition of
“sensemaking” was inclusive of the term’s usage in across
multiple subject areas. This was both eye opening and consistent
with the workshop’s theme of diversity. One clear take away for
our team was that this was a multidisciplinary challenge.
We started by discussing the existing definitions of sensemaking,
including those of Klein [1], Pirolli/Card [2], and Weick [3]. We
broadened these definitions to include elements surrounding and
supporting human cognition because the various challenges
inherent in a “big data environment” were, by definition, larger
than the space between the decision maker’s ears. To explore how
humans might make sense of big data, the team felt it necessary to
integrate the characteristics of the tools used, types analysis
problems, aspects of data, and organizational communication.
Simply put, “sensemaking of big data” might be better understood
as “sensemaking within a big data environment” and that
environment was not owned by any single discipline.
From that point, we attempted to understand the structure of this
environment. Our process was:
1. Development of research questions – what are the known
challenges we need to address?
2. Establish a hypothesis – can we measure the influence of Big
Data on Sensemaking?
3. Create a common framework for the community – define
how to navigate this space
4. Begin mapping our research questions to this framework –
put our research on the same page
On the last day of the workshop, we presented our progress and
findings to the entire workshop and participated in a panel
discussion. It is our hope that a community of interest will spring
from this workshop to address these research challenges in a
systematic and collaborative manner.
2. RESEARCH QUESTIONS Our first team exercise was to generate a list of potential research
topics. There were many, and they were as widely varied as our
team, but we have attempted to distill and categorize them. We
also tried to carefully focus on the intersection of human
sensemaking and big data, to avoid duplication of other research
communities’ efforts on these topics.
1 Corresponding author: [email protected]
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact
the Owner/Author.
Copyright is held by the owner/author(s). HCBDR '14, April 01 - 03 2014, Raleigh, NC, USA
ACM 978-1-4503-2938-5/14/04…$15.00.
http://dx.doi.org/10.1145/2609876.2609889
1st
Workshop on Human-Centered Big Data Research April 1-3, 2014, Raleigh, NC, USA
54
2.1 Analysis Our first category was based on the tasks that analysts perform.
The research questions can be summarized as:
What are the key differences between working with big and
regular data?
What’s different when we vary the data characteristics (the
V’s)?
How can analysts differentiate between noise and data?
What triggers an expert to update his/her mental models?
What about trust in data/analytics/process?
How do we rank many plausible futures?
2.2 Veracity While all the V’s (volume, velocity, variety, veracity, etc…) were
discussed, veracity was the one that resonated most strongly with
the team as having unique research challenges for sensemaking in
a big data environment. Research questions included:
What is the value of data?
How do you make sense of data from outside sources?
o Ontology differences, uncertainty/assumptions, trouble
combining datasets that use different nuances or types of
data (e.g., crime statistics across precincts, states, regions).
How do we handle missing data?
o Absence of evidences ≠ evidence of absence
o How might tools help us understand veracity?
o Uncertainty, risk, and unknown-unknowns
o Is there a “Veracity taxonomy” or can we make one?
2.3 Tools The question of which tools to use was very important in
discussions with stakeholders, as they act as the lens through
which the analyst can sense and also interact with the big data
environment. Feedback indicated that while tools could handle
scale and speed, these did not always translate to understanding.
Key research questions were:
How do you deal with big data in small ways?
o Can you distribute a sensemaking task?
What tools can we use to improve sensemaking?
How do visual metaphors make sense?
o Can we assess communication of sense?
What is the right representation of sense?
2.4 Human Aspects While all of our questions included a human aspect, there were
some unique questions where additional research in this
environment is necessary:
How do we classify populations in this environment?
Populations in this context include end users of the data and
those who generate the data.
What about IC vs external (production vs proxy)
differences?
What makes it difficult for individuals and groups to do
sense-making?
How do we share experiences across domains and
disciplines?
How do you develop teams for sense-making? How do you
compose, train, and maintain teams to understand and
quickly develop accurate understandings of a situation and
data?
How do we deal with shift hand overs, meaning, how can
one individual pass along a sensemaking story to another
3. HYPOTHESIS The team decided it would be helpful to start with a unifying
hypothesis. We agreed on:
“We can measure the influence of Big Data on
Sensemaking.”
This assumes that sensemaking is measurable, which raises
questions about what appropriate metrics for sensemaking should
be. We believed these needed to quantify sensemaking in such a
way that characteristics of the data, task, and tools could be
independent variables. The team discussed a number of ideas and
eventually agreed that measurements should assess both accuracy
and time required to perform a relevant task (e.g., a benchmark).
Our next challenge was to outline a framework for testing this
hypothesis within our community.
4. COMMUNITY FRAMEWORK Our attempt to build a unified experimental framework began
with a table crossing big data characteristics (e.g., volume,
velocity, veracity, and variety) with sensemaking (e.g., hypothesis
generation, information foraging, classifying, discovery, etc…).
Unfortunately, the whiteboard quickly filled and this matrix
became unmanageable. However, in the process we identified four
key dimensions that captured our notion of a big data
environment. We realized that much of the quantitative
assessment of sensemaking could be rolled into the accuracy and
timing of performance on a given task. We called this
“H+T+T+V=S” (Figure 1).
Figure 1. The four dimensions of the H+T+T+V=S model that
underlies a framework for assessing sensemaking within a big
data environment.
1st
Workshop on Human-Centered Big Data Research April 1-3, 2014, Raleigh, NC, USA
55
The four dimensions of our framework are:
Human: Demographics, personality, expertise, teaming, etc.
Tasks: Find X, relationship between X and Y, similarity to
X, etc.
Tools: Type of Analytics, Visualization, or other tools
Big Data Characteristics (the V’s): Volume, Velocity,
Veracity, and Variety
We propose this framework as a community tool for focusing and
communicating research. We believe that research is needed to
explore this space systematically to evaluate sensemaking in a big
data environment. We foresee identifying “hot-spots” where a
variable change has a dramatic effect on sensemaking accuracy
and/or timing. This occurs because they represent areas where
innovative research might achieve big gains in the form of
improving sensemaking in a big data environment.
5. MAPPING RESEARCH QUESTIONS TO
THE FRAMEWORK An essential element of the community framework is that it
enables researchers to communicate how their work fits into this
community space. For example, in answering the question “how
does volume of data influence sensemaking?” we might vary the
volume of data (IV) for a select a class of task (e.g., “find the X”),
tool set (e.g., Excel), for a controlled or moderated group of
subjects. By measuring how accurate solutions were and/or how
long it took participants to answer correctly, we can understand
how the sensemaking performance curve changes and at what
points we observe issues. These results could then be used to
target research and compare findings.
6. ACKNOWLEDGEMENTS We would like to thank the Laboratory for Analytic Sciences for
hosting this workshop and all of the stakeholders representing the
intelligence community who brought insight and real-world
applications to this discussion.
7. References [1] Klein, G., Philips, J. K. and Peluso, D. A. (2007) A
data/frame theory of sensemaking. In R. Hoffman., ed.
Expertise Out of Context: Proceedings of the Sixth
International Conference on Naturalistic Decision Making.
Mahwah, NJ: Lawrence Erlbaum Associates.
[2] Pirolli, P., & Card, S. (2005). The sensemaking process and
leverage points for analyst technology as identified through
cognitive task analysis. IA'05.
[3] Weick, K. E. (1995). Sensemaking in Organizations. Sage
Publications, Thousand Oaks, CA.