[acm press the 2014 workshop - raleigh, nc, usa (2014.04.01-2014.04.03)] proceedings of the 2014...

1st

Workshop on Human-Centered Big Data Research April 1-3, 2014, Raleigh, NC, USA

53

Sensemaking in Big Data Environments Chris Argenta1

Applied Research Associates

Jordan Benson SAS Institute

Nathan Bos The Johns Hopkins University Applied Physics Laboratory

Susannah B. F. Paletz Center for Advanced Study of Language

University of Maryland

William Pike Pacific Northwest National Laboratory

Aaron Wilson Palo Alto Research Center

ABSTRACT

We report on the sensemaking breakout group at the Human

Centered Big Data Research (HCBDR-2014) workshop. The

authors are a multi-disciplinary team of invited researchers and

stakeholders who participated in this breakout session. This report

includes an overview of our discussions on the many research

challenges associated with sensemaking within a big data

environment. Specifically, we focused on key topics that fit

squarely in the intersection of the sensemaking and big data

research, as other communities already exist for decision making

and big data technologies independently. As part of this effort, our

group developed and proposed a framework around which this

community can target and structure future research. This

framework is intended to allow the community to systematically

identify areas where innovative research might make large

contributions to sensemaking in a big data environment.

Categories and Subject Descriptors

H.3.3 Information Search and Retrieval

General Terms

Experimentation, Human Factors, Measurement.

Keywords

Big Data, sensemaking, workshop.

1. INTRODUCTION This report outlines the proceedings of the sensemaking breakout

group over three days at the Human-Centered Big Data Research

2014 workshop. Our team was composed of invited speakers and

stakeholders with expertise relevant to the intersection of human

cognitive and data sciences. Perhaps the most notable

characteristic of our team was the diversity of its members and

many domains of expertise represented. As such, our definition of

“sensemaking” was inclusive of the term’s usage in across

multiple subject areas. This was both eye opening and consistent

with the workshop’s theme of diversity. One clear take away for

our team was that this was a multidisciplinary challenge.

We started by discussing the existing definitions of sensemaking,

including those of Klein [1], Pirolli/Card [2], and Weick [3]. We

broadened these definitions to include elements surrounding and

supporting human cognition because the various challenges

inherent in a “big data environment” were, by definition, larger

than the space between the decision maker’s ears. To explore how

humans might make sense of big data, the team felt it necessary to

integrate the characteristics of the tools used, types analysis

problems, aspects of data, and organizational communication.

Simply put, “sensemaking of big data” might be better understood

as “sensemaking within a big data environment” and that

environment was not owned by any single discipline.

From that point, we attempted to understand the structure of this

environment. Our process was:

1. Development of research questions – what are the known

challenges we need to address?

2. Establish a hypothesis – can we measure the influence of Big

Data on Sensemaking?

3. Create a common framework for the community – define

how to navigate this space

4. Begin mapping our research questions to this framework –

put our research on the same page

On the last day of the workshop, we presented our progress and

findings to the entire workshop and participated in a panel

discussion. It is our hope that a community of interest will spring

from this workshop to address these research challenges in a

systematic and collaborative manner.

2. RESEARCH QUESTIONS Our first team exercise was to generate a list of potential research

topics. There were many, and they were as widely varied as our

team, but we have attempted to distill and categorize them. We

also tried to carefully focus on the intersection of human

sensemaking and big data, to avoid duplication of other research

communities’ efforts on these topics.

1 Corresponding author: [email protected]

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact

the Owner/Author.

Copyright is held by the owner/author(s). HCBDR '14, April 01 - 03 2014, Raleigh, NC, USA

ACM 978-1-4503-2938-5/14/04…$15.00.

http://dx.doi.org/10.1145/2609876.2609889

mailto:[email protected]

http://dx.doi.org/10.1145/2609876.2609889

1st


54

2.1 Analysis Our first category was based on the tasks that analysts perform.

The research questions can be summarized as:

What are the key differences between working with big and

regular data?

What’s different when we vary the data characteristics (the

V’s)?

How can analysts differentiate between noise and data?

What triggers an expert to update his/her mental models?

What about trust in data/analytics/process?

How do we rank many plausible futures?

2.2 Veracity While all the V’s (volume, velocity, variety, veracity, etc…) were

discussed, veracity was the one that resonated most strongly with

the team as having unique research challenges for sensemaking in

a big data environment. Research questions included:

What is the value of data?

How do you make sense of data from outside sources?

o Ontology differences, uncertainty/assumptions, trouble

combining datasets that use different nuances or types of

data (e.g., crime statistics across precincts, states, regions).

How do we handle missing data?

o Absence of evidences ≠ evidence of absence

o How might tools help us understand veracity?

o Uncertainty, risk, and unknown-unknowns

o Is there a “Veracity taxonomy” or can we make one?

2.3 Tools The question of which tools to use was very important in

discussions with stakeholders, as they act as the lens through

which the analyst can sense and also interact with the big data

environment. Feedback indicated that while tools could handle

scale and speed, these did not always translate to understanding.

Key research questions were:

How do you deal with big data in small ways?

o Can you distribute a sensemaking task?

What tools can we use to improve sensemaking?

How do visual metaphors make sense?

o Can we assess communication of sense?

What is the right representation of sense?

2.4 Human Aspects While all of our questions included a human aspect, there were

some unique questions where additional research in this

environment is necessary:

How do we classify populations in this environment?

Populations in this context include end users of the data and

those who generate the data.

What about IC vs external (production vs proxy)

differences?

What makes it difficult for individuals and groups to do

sense-making?

How do we share experiences across domains and

disciplines?

How do you develop teams for sense-making? How do you

compose, train, and maintain teams to understand and

quickly develop accurate understandings of a situation and

data?

How do we deal with shift hand overs, meaning, how can

one individual pass along a sensemaking story to another

3. HYPOTHESIS The team decided it would be helpful to start with a unifying

hypothesis. We agreed on:

“We can measure the influence of Big Data on

Sensemaking.”

This assumes that sensemaking is measurable, which raises

questions about what appropriate metrics for sensemaking should

be. We believed these needed to quantify sensemaking in such a

way that characteristics of the data, task, and tools could be

independent variables. The team discussed a number of ideas and

eventually agreed that measurements should assess both accuracy

and time required to perform a relevant task (e.g., a benchmark).

Our next challenge was to outline a framework for testing this

hypothesis within our community.

4. COMMUNITY FRAMEWORK Our attempt to build a unified experimental framework began

with a table crossing big data characteristics (e.g., volume,

velocity, veracity, and variety) with sensemaking (e.g., hypothesis

generation, information foraging, classifying, discovery, etc…).

Unfortunately, the whiteboard quickly filled and this matrix

became unmanageable. However, in the process we identified four

key dimensions that captured our notion of a big data

environment. We realized that much of the quantitative

assessment of sensemaking could be rolled into the accuracy and

timing of performance on a given task. We called this

“H+T+T+V=S” (Figure 1).

Figure 1. The four dimensions of the H+T+T+V=S model that

underlies a framework for assessing sensemaking within a big

data environment.

1st


55

The four dimensions of our framework are:

Human: Demographics, personality, expertise, teaming, etc.

Tasks: Find X, relationship between X and Y, similarity to

X, etc.

Tools: Type of Analytics, Visualization, or other tools

Big Data Characteristics (the V’s): Volume, Velocity,

Veracity, and Variety

We propose this framework as a community tool for focusing and

communicating research. We believe that research is needed to

explore this space systematically to evaluate sensemaking in a big

data environment. We foresee identifying “hot-spots” where a

variable change has a dramatic effect on sensemaking accuracy

and/or timing. This occurs because they represent areas where

innovative research might achieve big gains in the form of

improving sensemaking in a big data environment.

5. MAPPING RESEARCH QUESTIONS TO

THE FRAMEWORK An essential element of the community framework is that it

enables researchers to communicate how their work fits into this

community space. For example, in answering the question “how

does volume of data influence sensemaking?” we might vary the

volume of data (IV) for a select a class of task (e.g., “find the X”),

tool set (e.g., Excel), for a controlled or moderated group of

subjects. By measuring how accurate solutions were and/or how

long it took participants to answer correctly, we can understand

how the sensemaking performance curve changes and at what

points we observe issues. These results could then be used to

target research and compare findings.

6. ACKNOWLEDGEMENTS We would like to thank the Laboratory for Analytic Sciences for

hosting this workshop and all of the stakeholders representing the

intelligence community who brought insight and real-world

applications to this discussion.

7. References [1] Klein, G., Philips, J. K. and Peluso, D. A. (2007) A

data/frame theory of sensemaking. In R. Hoffman., ed.

Expertise Out of Context: Proceedings of the Sixth

International Conference on Naturalistic Decision Making.

Mahwah, NJ: Lawrence Erlbaum Associates.

[2] Pirolli, P., & Card, S. (2005). The sensemaking process and

leverage points for analyst technology as identified through

cognitive task analysis. IA'05.

[3] Weick, K. E. (1995). Sensemaking in Organizations. Sage

Publications, Thousand Oaks, CA.

[acm press the 2014 workshop - raleigh, nc, usa (2014.04.01-2014.04.03)] proceedings of the 2014...

Documents