symbiotic cognitive computing - tom...

In 2011, IBM’s Watson competed on the game show Jeop-ardy! winning against the two best players of all time, BradRutter and Ken Jennings (Ferrucci et al. 2010). Since this

demonstration, IBM has expanded its research program inartificial intelligence (AI), including the areas of natural lan-guage processing and machine learning (Kelly and Hamm2013). Ultimately, IBM sees the opportunity to develop cog-nitive computing — a unified and universal platform forcomputational intelligence (Modha et al. 2011). But howmight cognitive computing work in real environments —and in concert with people?

In 2013, our group within IBM Research started to explorehow to embed cognitive computing in physical environ-ments. We built a Cognitive Environments Laboratory (CEL)(see figure 1) as a living lab to explore how people and cog-nitive computing come together.

Our effort focuses not only on the physical and computa-tional substrate, but also on the users’ experience. We envi-sion a fluid and natural interaction that extends throughtime across multiple environments (office, meeting room,living room, car, mobile). In this view, cognitive computingsystems are always on and available to engage with people inthe environment. The system appears to follow individualusers, or groups of users, as they change environments, seam-lessly connecting the users to available input and outputdevices and extending their reach beyond their own cogni-tive and sensory abilities.

We call this symbiotic cognitive computing: computation thattakes place when people and intelligent agents come togeth-er in a physical space to interact with one another. The intel-ligent agents use a computational substrate of “cogs” for visu-al object recognition, natural language parsing, probabilisticdecision support, and other functions. The term cog is from

Articles

FALL 2016 81Copyright © 2016, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602

Symbiotic Cognitive Computing

Robert Farrell, Jonathan Lenchner, Jeffrey Kephart, Alan Webb, Michael Muller, Thomas Erickson, David Melville, Rachel Bellamy,

Daniel Gruen, Jonathan Connell, Danny Soroker, Andy Aaron, Shari Trewin, Maryam Ashoori, Jason Ellis, Brian Gaucher, Dario Gil

� IBM Research is engaged in aresearch program in symbiotic cognitivecomputing to investigate how to embedcognitive computing in physical spaces.This article proposes five key principlesof symbiotic cognitive computing: con-text, connection, representation, modu-larity, and adaptation, along with therequirements that flow from these prin-ciples. We describe how these principlesare applied in a particular symbioticcognitive computing environment andin an illustrative application for strate-gic decision making. Our results suggestthat these principles and the associatedsoftware architecture provide a solidfoundation for building applicationswhere people and intelligent agentswork together in a shared physical andcomputational environment. We con-clude with a list of challenges that lieahead.

the book The Society of Mind where Marvin Minskylikened agents to “cogs of great machines” (Minsky1988). These cogs are available to intelligent agentsthrough programmatic interfaces and to human par-ticipants through user interfaces.

Our long-term goal is to produce a physical andcomputational environment that measurablyimproves the performance of groups on key tasksrequiring large amounts of data and significant men-tal effort, such as information discovery, situationalassessment, product design, and strategic decisionmaking. To date, we have focused specifically onbuilding a cognitive environment, a physical spaceembedded with cognitive computing systems, to sup-port business meetings for strategic decision making.Other applications include corporate meetingsexploring potential mergers and acquisitions, execu-tive meetings on whether to purchase oil fields, andutility company meetings to address electrical gridoutages. These meetings often bring together a groupof participants with varied roles, skills, expertise, andpoints of view. They involve making decisions with alarge number of high-impact choices that need to beevaluated on multiple dimensions taking intoaccount large amounts of structured and unstruc-tured data.

While meetings are an essential part of business,studies show that they are generally costly andunproductive, and participants find them too fre-quent, lengthy, and boring (Romano and Nunamak-er 2001). Despite this, intelligent systems have thepotential to vastly improve our ability to have pro-ductive meetings (Shrobe et al. 2001). For example,an intelligent system can remember every conversa-tion, record all information on displays, and answerquestions for meeting participants. People makinghigh-stakes, high-pressure decisions have high expec-tations. They typically do not have the time or desireto use computing systems that add to their workloador distract from the task at hand. Thus, we are aim-

ing for a “frictionless” environment that is alwaysavailable, knowledgeable, and engaged. The systemmust eliminate any extraneous steps betweenthought and computation, and minimize disruptionswhile bringing important information to the fore.

The remainder of this article is organized as fol-lows. In the next section, we review the literaturethat motivated our vision of symbiotic cognitivecomputing. We then propose five fundamental prin-ciples of symbiotic cognitive computing. We listsome of the key requirements for cognitive environ-ments that implement these principles. We then pro-vide a description of the Cognitive EnvironmentsLab, our cognitive environments test bed, and intro-duce a prototype meeting-support application webuilt for corporate mergers and acquisitions (M&A)that runs in this environment. We wrap up by return-ing to our basic tenets, stating our conclusions, andlisting problems for future study.

BackgroundIn his paper “Man-Machine Symbiosis,” J. C. R. Lick-lider (1960) originated the concept of symbiotic com-puting. He wrote,

Present-day computers are designed primarily to solvepreformulated problems or to process data accordingto predetermined procedures. … However, many prob-lems … are very difficult to think through in advance.They would be easier to solve, and they could besolved faster, through an intuitively guided trial-and-error procedure in which the computer cooperated,turning up flaws in the reasoning or revealing unex-pected turns in the solution.

Licklider stressed that this kind of computer-sup-ported cooperation was important for real-time deci-sion making. He thought it important to “bring com-puting machines effectively into processes ofthinking that must go on in real time, time thatmoves too fast to permit using computers in conven-

Articles

82 AI MAGAZINE

Figure 1. The Cognitive Environments Lab.

CEL is equipped with movement sensors, microphones, cameras, speakers, and displays. Speech and gesture are used to run cloud-basedservices, manipulate data, run analytics, and generate spoken and visual outputs. Wands, and other devices enable users to move visual ele-ments in three dimensions across displays and interact directly with data.

Fi 1 ThT C itit E i t L b

Articles

FALL 2016 83

tional ways.” Licklider likely did not foresee theexplosive growth in data and computing power inthe last several decades, but he was remarkably pre-scient in his vision of man-machine symbiosis.

Distributed cognition (Hutchins 1995) recognizesthat people form a tightly coupled system with theirenvironment. Cognition does not occur solely oreven mostly within an individual human mind, butrather is distributed across people, the artifacts theyuse, and the environments in which they operate.External representations often capture the currentunderstanding of the group, and collaboration ismediated by the representations that are created,manipulated, and shared. In activity theory (Nardi1996), shared representations are used for establish-ing collective goals and for communication and coor-dinated action around those goals.

Work on cognitive architectures (Langley, Laird,and Rogers 2009; Anderson 1983; Laird, Newell, andRosenbloom 1987) focuses on discovering the under-lying mechanisms of human cognition. For example,the adaptive control of thought (ACT family of cog-nitive architectures includes semantic networks formodeling long-term memory and production rulesfor modeling reasoning, and learning mechanisms forimproving both (Anderson, Farrell, and Sauers 1984).

Work on multiagent systems (Genesereth andKetchpel 1994) has focused on building intelligentsystems that use coordination, and potentially com-petition, among relatively simple, independentlyconstructed software agents to perform tasks thatnormally require human intelligence. Minsky (1988)explained that “each mental agent in itself can dosome simple thing that needs no mind or thought atall. Yet when we join these agents in societies — incertain very special ways — this leads to true intelli-gence.”

Calm technology (Weiser and Brown 1996) sug-gests that when peripheral awareness is engaged, peo-ple can more readily focus their attention. People aretypically aware of a lot of peripheral information,and something will move to the center of their atten-tion, for example when they perceive that things arenot going as expected. They will then process theitem in focus, and when they are done it will fadeback to the periphery.

We have used these ideas as the basis for our visionof symbiotic cognitive computing.

Principles of Symbiotic Cognitive Computing

Our work leads us to propose five key principles ofsymbiotic cognitive computing: context, connection,representation, modularity, and adaption. Theseprinciples suggest requirements for an effective sym-biosis between intelligent agents and human partici-pants in a physical environment.

The context principle states that the symbiosis

should be grounded in the current physical and cog-nitive circumstances. The environment should main-tain presence, track and reflect activity, and build andmanage context. To maintain presence, the environ-ment should provide the means for intelligent agentsto communicate their availability and function, andshould attempt to identify people who are availableto engage with intelligent agents and with oneanother. To track and reflect activity, the environ-ment should follow the activity of people andbetween people and among people, and the physicaland computational objects in the environment orenvironments. It should, when appropriate, commu-nicate the activity back to people in the environ-ment. At other times, it should await human initia-tives before communicating or acting. To build andmanage context, the environment should create andmaintain active visual and linguistic contexts withinand across environments to serve as common groundfor the people and machines in the symbiosis, andshould provide references to shared physical and dig-ital artifacts and to conversational foci.

The connection principle states that the symbiosisshould engage humans and machines with oneanother. The environment should reduce barriers,distractions, and interruptions and not put physical(for example, walls) or digital barriers (for example,pixels) between people. The environment should,when appropriate, detect and respond to opportuni-ties to interact with people across visual and audito-ry modalities. The environment should provide mul-tiple independent means for users to discover andinteract with agents. It should enable people andagents to communicate within and across environ-ments using visual and auditory modalities. Theenvironment should also help users establish jointgoals both with one another and with agents. Final-ly, the environment should include agents that arecooperative with users in all interactions, helpingusers understand the users’ own goals and options,and conveying relevant information in a timely fash-ion (Grice 1975).

The representation principle states that the symbiosisshould produce representations that become thebasis for communication, joint goals, and coordinat-ed action between and among humans andmachines. The environment should maintain inter-nal representations based on the tracked users, jointgoals, and activities that are stored for later retrieval.The environment should externalize selected repre-sentations, and any potential misunderstandings, tocoordinate with users and facilitate transfer of repre-sentations across cognitive environments. Finally,the environment should utilize the internal andexternal representations to enable seamless contextswitching between different physical spaces andbetween different activities within the same physicalspace by retrieving the appropriate stored represen-tations for the current context.

The modularity principle states that the symbiosisshould be driven by largely independent modularcomposable computational elements that operate onthe representations and can be accessed equally byhumans and machines. The environment shouldprovide a means for modular software componentsto describe themselves for use by other agents or bypeople in the environment. The environment shouldalso provide a means of composing modular softwarecomponents, which perform limited tasks with a sub-set of the representation, with other components, tocollectively produce behavior for agents. The envi-ronment should provide means for modular softwarecomponents to communicate with one another inde-pendently of the people in the environment.

Finally, the adaptation principle states that the sym-biosis should improve with time. The environmentshould provide adequate feedback to users andaccept feedback from users. Finally, the environmentshould incrementally improve the symbiosis frominteractions with users and in effect, learn.

We arrived at these principles by reflecting uponthe state of human-computer interaction with intel-ligent agents and on our own experiences attempt-ing to create effective symbiotic interactions in theCEL. The context principle originates from our obser-vation that most conversational systems operatewith little or no linguistic or visual context. Break-downs often occur during human-machine dialoguedue to lack of shared context. The connection prin-ciple arises out of our observation that today’sdevices are often situated between people andbecome an impediment to engagement. The repre-sentation principle was motivated by our observationthat people often resolve ambiguities, disagreements,and diverging goals by drawing or creating othervisual artifacts. The use of external representationsreduces the domain of discourse and focuses partieson a shared understanding. The modularity principlearose from the practical considerations associatedwith building the system. We needed ways of addingcompeting or complementary cogs without reimple-menting existing cogs. The adaptation principle wasmotivated by the need to apply machine learningalgorithms to a larger range of human-computerinteraction tasks. Natural language parsing, multi-modal reference resolution, and other tasks shouldimprove through user input and feedback.

One question we asked ourselves when designingthe principles was whether they apply equally tohuman-human and human-computer interaction.Context, connection, representation, and adaptationall apply equally well to these situations. The modu-larity principle may appear to be an exception, butthe ability to surface cogs to both human and com-puter participants in the environment enables bothbetter collaboration and improved human-computerinteraction.

Cognitive environments that implement these

requirements enable people and intelligent agents tobe mutually aware of each others’ presence and activ-ity, develop connections through interaction, createshared representations, and improve over time. Byproviding both intelligent agents and human partic-ipants with access to the same representations andthe same computational building blocks, a naturalsymbiosis can be supported.

It is impossible to argue that we have found adefinitive set of principles; future researchers mayfind better ones or perhaps more self-evident onesfrom which the ones we have articulated can bederived. It may even be possible to create a bettersymbiotic cognitive system than any we have createdand not obey one or more of our principles. We lookforward to hearing about any such developments.

We are starting to realize these principles andrequirements by building prototype cognitive envi-ronments at IBM Research laboratories worldwide.

The Cognitive Environments Laboratory

The Cognitive Environments Laboratory is located atthe IBM T. J. Watson Research Center in YorktownHeights, New York. The lab is meant to be a test bedfor exploring what various envisioned cognitive envi-ronments might be like. It is more heavily instru-mented than the vast majority of our envisioned cog-nitive environments, but the idea is that over timewe will see what instrumentation works and whatdoes not. The lab is focused on engaging users withone another by providing just the right technology tosupport this engagement.

Perhaps the most prominent feature of the CEL isits large number of displays. In the front of the roomthere is a four by four array of high definition moni-tors (1920 x 1080 pixel resolution), which act like asingle large display surface. On either side of theroom are two pairs of high definition monitors ontracks. These monitor pairs can be moved from theback to the front of the room along tracks inlaid inthe ceiling, enabling fast and immediate reconfigura-tion of the room to match many meeting types andactivities. In the back of the room there is an 84-inchtouch-enabled 3840 x 2160 pixel display. The moni-tors are laid out around the periphery of the room.Within the room, visual content can either be movedprogrammatically or with the aid of special ultra-sound-enabled pointing devices called “wands” orwith combinations of gesture and voice, from moni-tor to monitor or within individual monitors.

In addition to the displays, the room is outfittedwith a large number of microphones and speakers.There are several lapel microphones, gooseneckmicrophones, and a smattering of microphonesattached to the ceiling. We have also experimentedwith array microphones that support “beam form-ing” to isolate the speech of multiple simultaneous

Articles

84 AI MAGAZINE

speakers without the need for individual micro-phones.

An intelligent agent we named Celia (cognitiveenvironments laboratory intelligent agent) senses theconversation of the room occupants and becomes asupporting participant in meetings. With the aid of aspeech-to-text transcription system, the room candocument what is being said. Moreover, the text andaudio content of meetings is continuously archived.Participants can ask, for example, to recall the tran-script or audio of all meetings that discussed “graphdatabases” or the segment of the current meetingwhere such databases were discussed. Transcribedutterances are parsed using various natural languageprocessing technologies, and may be recognized ascommands, statements, or questions. For example,one can define a listener that waits for key words orphrases that trigger commands to the system to dosomething. The listener can also test whether certainpreconditions are satisfied, such as whether certainobjects are being displayed. Commands can invokeagents that retrieve information from the web ordatabases, run structured or unstructured data ana-lytics, route questions to the Watson question-answering system, and produce interactive visualiza-tions. Moreover, with the aid of a text-to-speechsystem, the room can synthesize appropriate respons-es to commands. The system can be configured to usethe voice of IBM Watson or a different voice.

In addition to the audio and video output sys-tems, the room contains eight pan-tilt-zoom cam-eras, four of which are Internet Protocol (IP) cam-eras, plus three depth-sensing devices, one of whichis gimbal mounted with software-controllable panand tilt capability. The depth-sensing systems areused to detect the presence of people in the roomand track their location and hand gestures. The cur-rent set of multichannel output technologies (thatis, including screens and speakers) and multichannelinput technologies (that is, keyboard, speech-to-text,motion) provide an array of mixed-initiative possi-bilities.

People in the CEL can simultaneously gesture andspeak to Celia to manipulate and select objects andoperate on those objects with data analytics and serv-ices. The room can then generate and display infor-mation and generate speech to indicate objects ofinterest, explain concepts, or provide affordances forfurther interaction. The experience is one of inter-acting with Celia as a gateway to a large number ofindependently addressable components, cogs, manyof which work instantly to augment the cognitiveabilities of the group of people in the room.

Dependence on a single modality in a complexenvironment generally leads to ineffective andinconvenient interactions (Oviatt 2000). Environ-ments can enable higher level and robust interac-tions by exploiting the redundancy in multimodalinputs (speech, gesture, vision). The integration of

speech and gesture modalities has been shown toprovide both flexibility and convenience to users(Krum 2002). Several prototypes have been imple-mented and described in the literature. Bolt (1980)used voice and gesture inputs to issue commands todisplay simple shapes on a large screen. Sherma(2003) and Carbini (2006) extended this idea to amultiuser interaction space. We have built upon theideas in this work in creating the mergers and acqui-sitions application.

Mergers and Acquisitions Application

In 2014 and 2015 we built a prototype system, situ-ated in the Cognitive Environments Laboratory, forexploring how a corporate strategy team makes deci-sions regarding potential mergers and acquisitions.As depicted in figure 2, one or more people can usespeech and gestures to interact with displayed objectsand with Celia. The system has proven useful forexploring some of the interaction patterns betweenpeople and intelligent agents in a high-stakes deci-sion-making scenario, and for suggesting architec-tural requirements and research challenges that mayapply generally to symbiotic cognitive computingsystems.

In the prototype, specialists working on mergersand acquisitions try to find reasonable acquisitiontargets and understand the trade-offs between them.They compare companies side by side and receiveguidance about which companies are most alignedwith their preferences, as inferred through repeatedinteractions. The end result is a small set of compa-nies to investigate with a full-fledged “due diligence”analysis that takes place following the meeting.

When the human collaborators have interactedwith the prototype to bring it to the point depictedin figure 2, they have explored the space of mergersand acquisitions candidates, querying the system forcompanies with relevant business descriptions andnumeric attributes that fall within desired ranges,such as the number of employees and the quarterlyrevenue. As information revealed by Celia is inter-leaved with discussions among the collaborators,often triggered by that information, the collaboratorsdevelop an idea of which company attributes mattermost to them. They can then invoke a decision tableto finish exploring the trade-offs.

Figure 3 provides a high-level view of the cognitiveenvironment and its multiagent software architec-ture. Agents communicate with one another througha publish-and-subscribe messaging system (the mes-sage broker) and through HTTP web services usingthe Representational State Transfer (REST) softwaredesign pattern. The system functions can be dividedinto command interpretation, command execution,agent management, decision making, text and dataanalysis, text-to-speech, visualization, and manage-

Articles

FALL 2016 85

ment. We explain each of these functions in the sec-tions that follow.

Command InterpretationThe system enables speech and gesture to be usedtogether as one natural method of communication.Utterances are captured by microphones, renderedinto text by speech recognition engines, and pub-lished to a “transcript” message channel managed bythe message broker. The message broker supportshigh-performance asynchronous messaging suitablefor the real-time concurrent communication in thecognitive environment.

We have tested and customized a variety of speech-recognition engines for the cognitive environment.The default engine is IBM Attila (Soltau, Saon, andKingsbury 2010). It has two modes: a first that ren-ders the transcription of an utterance once a break isdetected on the speech channel (for example, half asecond of silence), and a second that renders a word-by-word transcription without waiting for a pause. Inthe former mode there is some probability that sub-sequent words will alter the assessment of earlierwords. We run both modes in parallel to enable

agents to read and publish partial interpretationsimmediately.

Position and motion tracking uses output from theposition and motion sensors in combination withvisual object recognition using input from the cam-eras to locate, identify, and follow physical objects inthe environment. The user identity tracking agentmaps recognized people to unique users usingacoustic speaker identification, verbal introduction(“Celia, I am Bob”), facial recognition upon entry, orother methods. The speaker’s identity, if known, isadded to each message on the transcript channel,making it possible to interleave dialogues to somedegree, but further research is needed to handle com-plex multiuser dialogues. The persistent session infor-mation includes persistent identities for users,including name, title, and other information collect-ed during and across sessions.

The natural language parsing agent subscribes tothe transcript channel, processes text transcriptionsof utterances containing an attention word (forexample, “Celia” or “Watson”) into a semantic repre-sentation that captures the type of command andany relevant parameters, and publishes the represen-tation to a command channel. Our default parser is

Articles

86 AI MAGAZINE

Figure 2. The Mergers and Acquisitions Prototype Application.

People working with one another and with Celia to discover companies that match desired criteria obtain detailed information about like-ly candidates and winnow the chosen companies down to a small number that are most suitable.

based on regular expression matching and templatefilling. It uses a hierarchical, composable set of func-tions that match tokens in the text to grammaticalpatterns and outputs a semantic representation thatcan be passed on to the command executor. Mostsentences have a subject-verb-object structure, with“Celia” as the subject, a verb that corresponds to aprimitive function of one of the agents or a morecomplex domain-specific procedure, and an objectthat corresponds to one or more named entities, asresolved by the named entity resolution agent, orwith modifiers that operate on primitive types suchas numbers or dates. Another parser we have runningin the cognitive environment uses a semantic gram-mar that decomposes top-level commands into ter-minal tokens or unconstrained dictation of up to fivewords. The resulting parse tree is then transformedinto commands, each with a set of slots and fillers(Connell 2014). A third parser using Slot Grammar(SG) is from the Watson System (McCord, Murdock,and Boguraev 2012). It is a deep parser, producingboth syntactic structure and semantic annotationson input sentences. Interrogatives can be identified

by the SG parser and routed directly to a version ofthe IBM Watson system, with the highest confidenceanswer generated back to the user through text-to-speech.

An important issue that arose early in the devel-opment of the prototype was the imperfection ofspeech transcription. We targeted a command com-pletion rate of over 90 percent for experienced users,but with word-recognition accuracies in the low tomid 90 percent range and commands ranging from 5to 20 or more words in length, we were not achiev-ing this target. To address this deficiency, we devel-oped several mechanisms to ensure that speech-based communication between humans and Celiawould work acceptably in practice. First, using a halfhour of utterances captured from user interactionwith the prototype, we trained a speech model andenhanced this with a domain-specific language mod-el using a database of 8500 company names extract-ed from both structured and unstructured sources.The named entity resolution agent was extended toresolve acronyms and abbreviations and to matchboth phonetically and lexicographically. To provide

Articles

FALL 2016 87

Figure 3. Architecture of the Mergers and Acquisitions Prototype.

Analysis Agents(Query Processor, Concept

Analyzer, Documents &Databases)

Visualization AgentsDecision Agents

(Probabilistic DecisionSupport)

Command Executor Text to Speech

MessageBroker

DisplayManager

Microphones Cameras Displays Speakers

Position& MotionTracking

Position& MotionSensors

Visual ObjectRecognition

PersistentSession

Information

Software Subsystem

HardwareSubsystem

Speech Transcript

SpeechRecognition

User IdentityTracking

Named EntityResolution

NaturalLanguageParsing

Gestural &LinguisticReference

(Decision Table, CogBrowser, Information

Formatter & Graph Viewer

CommandExpression

better feedback to users, we added a speech tran-script. Celia’s utterances are labeled with “Celia,” anda command expression is also shown to reflect thecommand that the system processed. If the systemdoesn’t respond as expected, users can see whetherCelia misinterpreted their request, and if so, reissueit. Finally, we implemented an undo feature to sup-port restoration of the immediately prior state whenthe system misinterprets the previous command.

The gestural and linguistic reference agent isresponsible for fusing inputs from multiple modesinto a single command. It maintains persistent refer-ents for recently displayed visual elements andrecently mentioned named entities for the durationof a session. When Celia or a user mentions a partic-ular company or other named entity, this agent cre-ates a referent to the entity. Likewise, when the dis-play manager shows a particular company or othervisual element at the request of a user or Celia, a ref-erent is generated. Using the referents, this agent canfind relevant entities for users or Celia based on lin-guistic references such as pronouns, gestures such aspointing, or both. Typically the referent is eithersomething the speaker is pointing toward or is some-thing that has recently been mentioned. In the eventthat the pronoun refers to something being pointedat, the gestural and linguistic reference agent mayneed to use the visual object recognition’s output, orif the item being pointed at resides on a screen, theagent can request the virtual object from the appro-priate agent.

Command ExecutionThe command executor agent subscribes to the com-mand channel and oversees command execution.Some of its functions may be domain-specific. Thecommand executor agent communicates over HTTPweb services to the decision agents, analysis agents,and visualization agents. Often, the command execu-tor serves as an orchestrator, calling a first agent,receiving the response, and reformulating thatresponse to another agent. Thus, the commandexecutor maintains state during the utterance. A sin-gle request from a user may trigger a cascade of agent-to-agent communication throughout the commandexecution part of the flow, eventually culminating inactivity on the displays and/or synthesized speechbeing played over the speakers.

Decision MakingIn the mergers and acquisitions application, peoplehave high-level goals that they want the system tohelp them achieve. A common approach from deci-sion theory is to specify a utility function (Walsh etal. 2004). Given a utility function defined in terms ofattributes that are of concern to the user, the system’sobjective is to take actions or adjust controls so as toreach a feasible state that yields the highest possibleutility. A particularly attractive property of this

approach is that utility can be used to propagateobjectives through a system from one agent to anoth-er. However, experience with our symbiotic cognitivecomputing prototype suggests that this traditionalapproach misses something vital: people start withinexact notions of what they want and use computa-tional tools to explore the options. It is only duringthis sometimes-serendipitous exploration processthat they come to understand their goals better. Cer-tain companies appeal to us, sometimes before weeven know why, and it can take a serious introspec-tion effort (an effort that may be assisted by the cog-nitive system) to discover which attributes mattermost to us or to realize we may be biased. This real-ization prompted us to design a probabilistic decisionsupport agent (Bhattacharjya and Kephart 2014).This agent starts with a set of candidate solutions tothe decision support problem and attributes, and ahighly uncertain model of user preferences. As theuser accepts or rejects its recommendations, oranswers questions about trade-offs, the agent pro-gressively sharpens its understanding of the users’objectives, which it models as a probability distribu-tion of weights in the space of possible utility func-tions. The agent is able to recommend filteringactions, such as removing companies or removingattributes, to help users converge on a small numberof targets for mergers and acquisitions.

Text and Data AnalysisThe concept analyzer agent provides additional datafor decision making by extracting concepts and rela-tionships from documents. While IBM Watson wastrained for general question answering using prima-rily open web sources such as Wikipedia, we antici-pate that most applications will also involve har-nessing data from private databases and third-partyservices. For the mergers and acquisitions applica-tion, we developed a large database of companyannual reports. Concepts extracted from the reportscan be linked to the companies displayed in thegraph viewer and when a user asks for companiessimilar to a selected company, the system is able toretrieve companies through the concepts and rela-tionships. The query processor agent provides aquery interface to a database of company financialinformation, such as annual revenue, price-earningsratio, and income.

Persistent Session InformationWe have added the ability for the cognitive environ-ment to capture the state of the interaction betweenusers and agents either as needed or at the end of asession. It does this by creating a snapshot of whatagents are active, what is being displayed, what hasbeen said, and what commands have been complet-ed. The snapshots are saved and accessible from anydevice thus enabling users to take products of thework session outside of the cognitive environment.

Articles

88 AI MAGAZINE

The session capture feature allows users to reviewpast decisions and can support case-based reasoning(Leake 1996). It also provides continuity becauseusers can stop a decision-making process and contin-ue later. Finally, it allows for some degree of portabil-ity across multiple cognitive environments.

Text-to-SpeechThe text-to-speech agent converts text into spokenvoice. The current system has a choice of two voices:a North American English female voice or a NorthAmerican English male voice (which was used by theIBM Watson system). In order to keep interruptionsto a minimum, the speech output is used sparinglyand usually in conjunction with the visual display.

VisualizationThe visualization agents work in the cognitive envi-ronment’s multiuser multiscreen networked environ-ment. Celia places content on the 25 displays andeither the visualization agents or the users thenmanipulate the content in three dimensions. Wedesigned the M&A application visualizations to worktogether in the same visual space using commonstyles and behaviors. The display manager coordi-nates content placement and rendering using place-ment defaults and constraints. The cog browserenables people to find and learn about the availablecogs in the cognitive environment. It displays a cloudof icons representing the society of cogs. The iconscan be expanded to reveal information about eachcog and how they are invoked.

Many agents have functions that can be addressedthrough speech commands or through gesture. Forexample, the information formatter can display com-pany information (on the right in figure 2) andallows a user to say “products” or select the productstab to get more information about the company’sproducts. In addition, many of the agents that pro-vide building blocks for Celia’s decision-makingfunctions are cogs that are also independentlyaddressable through visual interfaces and speechcommands. For example, the graph viewer can beused by Celia to recommend companies but is alsoavailable to users for visualizing companies meetingvarious criteria.

Agent ManagementWe have implemented several management modulesthat operate in parallel with the other parts of thesystem to allow agents to find instances of otheragents that are running and thereby enable discov-ery and communication. When an agent is firstlaunched, it registers itself to a life-cycle managermodule to advertise its REST interfaces and types ofmessages it publishes over specific publish-and-sub-scribe channels. When an agent requires the servicesof a second agent, it can locate an instance of the sec-ond agent by querying the lifecycle manager, there-

by avoiding the need to know details of the secondagent’s running instance.

The agents used in the M&A application and oth-ers are available as cogs in the CEL and work as oneintelligent agent to provide access to cognitive com-puting services. Taken together, the agents provide acompletely new computing experience for businessusers facing tough decisions.

A sample dialogue is shown in figure 4. To handleexchange 1, the system processes the first sentenceusing the speech recognition engine and sends amessage with the transcribed text to the messagebroker on the transcript channel. The natural lan-guage parser listens on the transcript channel andpublishes a semantic representation based on adependency parse of the input that identifies thename Brian in the object role. The user identitytracking agent resolves the name against the persist-ent identifier for Brian and publishes a command onthe command channel with a set user identifier. Thecommand executor then requests the speech recog-nition agent to start using the speaker’s speech mod-el. The next sentence is processed similarly, but thegestural and linguistic reference agent resolves “I” to“Brian.” The verb “help” is recognized as the mainaction and the actor is the persistent identifier forBrian. The command executor recognizes the repre-sentation as the initialization of the mergers andacquisitions application using a pattern rule. It callsthe display manager, which saves and hides the stateof the displays. The command executor then gener-ates the response that is sent to the text to speechagent. This requires querying the user identity track-

Articles

FALL 2016 89

Figure 4. A Sample Dialogue Processed by the Mergers and Acquisitions Prototype.

Figi ug rer 4 A Sampm le Dialogo ug e Pror cessed

Exchange 1 Brian: Celia, this is Brian. I need help with acquisitions.

Celia: Hello Brian, how can I help you with mergers and acquisitions?

Exchange 2 Brian: Celia, show me companies with revenue between $25 million and $50 million and between 100 and 500 employees, pertaining to analytics.

Celia: Here is a graph showing 96 companies pertaining to biotechnology(Celia displays the graph).

Exchange 3

Brian: Celia, place the companies named brain science, lintolin, and tata, in a decision table.

Celia: Ok. (Celia shows a table with the 3 companies, one per row, and withcolumns for the name of the company, the revenue, and number of employees).

Celia: I suggest removing Lyntolin. Brain Sciences, Incorporated has greater revenue and greater number of employees (Celia highlights Brain Sciences and Lyntolin).

ing agent to map the persistent identifier for Brian tohis name.

To handle exchange 2, a similar flow happens,except the command executor calls the query proces-sor agent to find companies matching the revenueand company size criteria. Upon receiving theresponse, the command executor calls the graphviewer, which adds nodes representing each match-ing biotechnology company to a force-directed graphon the display (in the center in figure 2). The com-mand executor also calls the text-to-speech agent toplay an acknowledgement over the speakers. Theuser can then manipulate the graph using gestures orissue further speech commands.

For exchange 3, the command executor first callsthe named entity resolver three times to resolve theexact names of the companies referred to by the user;for example it might resolve “brain science” into“Brain Sciences, Incorporated.” Upon receiving theseresponses, the command executor calls the queryprocessor agent to obtain company information,which it then sends to the probabilistic decision sup-port agent. This agent must interact with the deci-sion table agent, which in turn uses the display man-ager to display the output to the user. While all ofthis is happening, the executor also calls the text-to-speech agent to acknowledge the user’s request.Coordination between speech and display thus hap-pens in the command executor. During this interac-tion, the transcript displayer and command display-er display the utterance and the interpretedcommand.

In the next section, we discuss our work to date onthe cognitive environment in terms of both priorwork and our original symbiotic cognitive comput-ing principles.

DiscussionPrior intelligent decision-making environmentsfocus primarily on sensor fusion, but fall short ofdemonstrating an intelligent meeting participant(Ramos et al 2010). The New EasyLiving Projectattempted to create a coherent user experience, butwas focused on integrating I/O devices (Brumitt et al.2000). The NIST Meeting Room has more than 280microphones, seven HD cameras, a smart white-board, and a locator system for the meeting attendees(Stanford et al. 2003), but little in the way of intelli-gent decision support. The CALO (Cognitive Assis-tant that Learns and Organizes) DARPA projectincludes a meeting assistant that captures speech,pen, and other meeting data and produces an auto-mated transcript, segmented by topic, and performsshallow discourse understanding to produce a list ofprobable action items (Voss and Ehlen 2007), but itdoes not focus on multimodal interaction.

The experience of building and using the M&Aapplication has been valuable in several respects.

First, while we haven’t yet run a formal evaluation,we’ve found that the concept of a cognitive environ-ment for decision making resonates well with busi-ness users. To date we have now had more than 50groups of industry executives see a demonstrationand provide feedback. We are now working closelywith the mergers and acquisitions specialists at IBMto bring aspects of the prototype into everyday use.Second, the prototype has helped us to refine and atleast partially realize the symbiotic cognitive com-puting principles defined in this article, and to gain abetter understanding of the nature of the researchchallenges. Here we assess our work on the prototypein terms of those principles.

First, how much of the symbiosis is grounded inthe physical and cognitive circumstances? We havejust started to explore the use of physical and lin-guistic context to shape the symbiosis. Some aspectsof maintaining presence are implemented. For exam-ple, motion tracking and visual recognition are usedto capture the presence of people in the room. How-ever, endowing intelligent agents with the ability toeffectively exploit information about individual peo-ple and their activities and capabilities remains a sig-nificant research challenge. Multiple people in thecognitive environment can interact with the system,but the system’s ability to associate activities withindividual users is limited. The session capture agenttracks both human commands and agent actions, butadditional work is required to reflect the activity ofpeople and agents in the environment back to par-ticipants. The linguistic and gestural reference agentmaintains some useful context for switching betweenapplications, but additional research is needed toexploit this context to enable an extended dialogue.

Second, how much does the cognitive environ-ment support the connection principle, enablingpeople and intelligent agents to engage with oneanother? We feel that the architecture and imple-mentation support all of the requirements, at least tosome degree. First, barriers between human intentionand system execution are reduced by multimodalinteractions that allow users to converse with the sys-tem almost as if it were a human partner rather thanhaving to deal with the cumbersome conventions oftypical user interfaces, but the lack of affordances inspeech-based interaction remains a challenge. Thecog browser provides users with some understandingof the capabilities of various cogs but the system doesnot offer assistance. The system supports interactionsacross multiple environments; cogs can in effect fol-low the user to different physical spaces, marshalingthe input and output resources that they find there —thereby reducing the time required to initiate systemcomputations and actions when moving across cog-nitive environments. The cognitive environmentcooperates with users in multiple ways: decisionagents use elicitation techniques to develop anunderstanding of user goals and trade-offs and then

Articles

90 AI MAGAZINE

to guide users toward decisions that best realizethem, Celia listens for commands and the commandexecutor responds only when adequate informationis available for a response. Because multiple users cansee the same display and Celia has access to displayedobjects through the display manager, Celia can trackand respond to their coordinated actions.

Does the cognitive environment support the rep-resentation principle? The CEL and its agents main-tain representations that are the basis for communi-cation between participants and with Celia. Theidentity and location of users in the room, the devel-oping conversation with Celia and recognized com-mands, and the state of ongoing decisions are all cap-tured, externalized, and shared outside theenvironment, providing common ground betweenboth people in the physical environment and thoseconnected to the environment only remotely or peri-odically. In future work, we would like to recognizeindividual differences in representation preferences,and be able to conduct “private” interactions withindividual users through the media of their choice.

We have realized the modularity requirements ofself-description, composition, and intercomponentcommunication by implementing the cognitive envi-ronment as a multiagent system. Some agents arestrongly human centered, providing services such asspeech and gesture transcription, speech synthesis, orvisualization. Others mainly serve the needs of otheragents. For example, the life-cycle manager facilitatesagent communication and composition by enablingagents to advertise their capabilities to one anotherand use one another’s services. An important researchchallenge is to create deeper semantic descriptions ofservices to allow users to select and compose servicesas needed through natural language dialogue.

How does the cognitive environment supportadaptation, improving with time? Currently most ofthe system’s improvement is offline, not during thedialogue. For example, we trained speech models andextended the language model with custom diction-aries. Users’ gestural vocabularies could also be mod-eled and interpreted, or we could develop individu-alized models of combinations of speech, gesture,and larger movements. We currently capture usefuldata during interactions with users that can be usedto improve interactions in the future. For example,the system captures linguistic and gestural context,which can in principle be mined by other agentsseeking to detect patterns that might be used to bet-ter anticipate user needs. Ultimately we would likecognitive systems to adapt to users’ goals and capa-bilities, available I/O resources, and available cogs tomaximize the effectiveness of the entire session andimprove the symbiotic relationship between usersand the system.

Despite our successes with engineering a symbiot-ic cognitive computing experience, practical applica-tions continue to be a challenge: speech commands

in a noisy room are often misrecognized and we can-not reliably identify individual speakers, the trackingof gestures is still error prone with both wands andhand gestures, and natural language inputs requiredomain-specific natural language engineering tomap commands to the proper software services andinvoke decision, analysis, and visualization agents.Despite these challenges, the cognitive environmentprovides a valuable test bed for integrating a varietyof IBM Research cognitive computing technologiesinto new scenarios and business applications.

ConclusionsThis article introduced our work on symbiotic cog-nitive computing. We outlined five principles: con-text, connection, representation, modularity, andadaptation, and we showed how key requirementsthat flow from these principles could be realized in acognitive environment. We have started to applythis environment to real business problems, includ-ing strategic decision making for corporate mergersand acquisitions.

Reaching a true symbiosis between cognitive com-puting and human cognition is a significant multi-year challenge. The IBM Watson question-answeringsystem and other intelligent agents can be embed-ded in physical spaces, but additional research isneeded to create cognitive computing systems thatcan truly sense the world around them and fullyinteract with people to solve difficult problems.

Our future directions include detection andunderstanding of emotion, cognitive computing invirtual and mixed reality, simultaneous speech andgesture understanding, integration of uncertain datafrom sensors into real-time interaction, and machinelearning to improve decisions over time. We are alsointerested in exploring tasks such as information dis-covery, situational assessment, and product designwhere difficult decisions require bringing togetherpeople who have complementary skills and experi-ence and providing them with large amounts ofstructured and unstructured data in one collabora-tive multisensory multimodal environment.

The IBM Watson Jeopardy! system demonstratedthe ability of machines to achieve a high level of per-formance at a task normally considered to requirehuman intelligence. We see symbiotic cognitivecomputing as the next natural step in the evolutionof intelligent machines: creating machines that areembedded in the world and integrate with everyaspect of life.

AcknowledgementsThanks also to Wendy Kellogg, Werner Geyer, CaseyDugan, Felicity Spowart, Bonnie John, VinayVenkataraman, Shang Gao, Mishal Dholakia, TomasBeren, Yedendra Shirnivasan, Mark Podlaseck, LisaAmini, Hui Su, and Guru Banavar.

Articles

FALL 2016 91

ReferencesAnderson, J. 1983. The Architecture of Cognition. Mahwah,NJ: Lawrence Erlbaum Associates.

Anderson, J.; Farrell, R.; and Sauers, R. 1984. Learning toProgram in LISP. Cognitive Science 8(2): 87–129.dx.doi.org/10.1207/s15516709cog0802_1

Bhattacharjya, D., and Kephart, J. O. 2014. Bayesian Inter-active Decision Support for Multi-Attribute Problems withEven Swaps. In Proceedings of the 30th Conference on Uncer-tainty in Artificial Intelligence, 72–81. Seattle, WA: AUAI Press.

Bolt, R. A. 1980. Put-That-There. In Proceedings of the 7thAnnual Conference on Computer Graphics and Interactive Tech-niques: SIGGRAPH ’80, 262–270. New York: Association forComputing Machinery. dx.doi.org/10.1145/800250.807503

Brumitt, B. L.; Meyers, B.; Krumm, J.; Kern, A.; and Shafer,S. 2000. EasyLiving: Technologies for Intelligent Environ-ments. In Handheld and Ubiquitous Computing, 2nd Interna-tional Symposium, September, Lecture Notes in Computerscience Volume 1927, ed P. Thomas and H.-W. Gellersen,12–27. Berlin: Springer.

Carbini, S.; Delphin-Poulat, L.; Perron, L.; and Viallet, J.2006. From a Wizard of Oz Experiment to a Real TimeSpeech and Gesture Multimodal Interface. Signal Processing86(12): 3559-3577. dx.doi.org/10.1016/j.sigpro.2006.04.001.

Connell, J. 2014. Extensible Grounding of Speech for RobotInstruction. In Robots That Talk and Listen, ed. J.Markowitz,175–201. Berlin: Walter de Gruyter GmbH andCo.

Ferrucci, D.; Brown, E.; Chu-Carroll, J.; Fan, J.; Gondek, D.;Kalyanpur, A. A.; Lally, A.; Murdock, J. W.; Nyberg, E.;Prager, J.; Schlaefer, N.; and Welty, C. 2010. Building Wat-son: An Overview of the DeepQA Project. AI Magazine 31(3):59–79.

Genesereth, M. R., and Ketchpel, S. P. 1994. SoftwareAgents. Communications of the ACM 37(7), 48–53. dx.doi.org/10.1145/176789.176794

Grice, P. 1975. Logic and Conversation. In Syntax andSemantics. 3: Speech Acts, ed. P. Cole and J. Morgan, 41–58.New York: Academic Press.

Hutchins, E. 1995. Cognition in the Wild. Cambridge, MA:The MIT Press.

Kelly, J., and Hamm, S. 2013. Smart Machines: IBM’s Watsonand the Era of Cognitive Computing. New York: Columbia Uni-versity Press.

Krum, D.; Omoteso, O.; Ribarsky, W.; Starner, T.; andHodges, L. 2002. Speech and Gesture Multimodal Controlof a Whole Earth 3D Visualization Environment. In Pro-ceedings of the 2002 Joint Eurographics and IEEE TCVG Sym-posium on Visualization, 195-200. Goslar, Germany: Euro-graphics Association.

Laird, J.; Newell, A.; and Rosenbloom, P. 1987. SOAR: AnArchitecture for General Intelligence. Artificial Intelligence33(1): 1–64. dx.doi.org/10.1016/0004-3702(87)90050-6

Langley, P.; Laird, J. E.; and Rogers, S. 2009. Cognitive Archi-tectures: Research Issues and Challenges. Cognitive SystemsResearch 10(2): 141–160. dx.doi.org/10.1016/j.cogsys.2006.07.004

Leake, D. B. 1996. Case-Based Reasoning: Experiences, Lessons,and Future Directions. Menlo Park, CA: AAAI Press.

Licklider, J. C. R. 1960. Man-Computer Symbiosis. IRETransactions on Human Factors in Electronics Volume HFE-

1(1): 4–11. www.dx.doi.org/10.1109/THFE2.1960.4503259

McCord, M. C.; Murdock, J. W.; and Boguraev, B. K. 2012.Deep Parsing in Watson. IBM Journal of Research and Devel-opment 56(3.4): 3:1–3:15.

Minsky, M. 1988. The Society of Mind. New York: Simon andSchuster.

Modha, D. S.; Ananthanarayanan, R.; Esser, S. K.; Ndirango,A.; Sherbondy, A. J.; and Singh, R. 2011. Cognitive Com-puting. Communications of the ACM 54(8): 62–71. dx.doi.org/10.1145/1978542.1978559

Nardi, B. A. 1996. Activity Theory and Human-ComputerInteraction. In Context and Consciousness: Activity Theory andHuman-Computer Interaction, ed. B. Nard, 7–16. Cambridge,MA: The MIT Press.

Oviatt, S.; and Cohen, P. 2000. Perceptual User Interfaces:Multimodal Interfaces that Process what Comes Naturally.Communications of the ACM 43(3): 45-53. dx.doi.org/10.1145/330534.330538Ramos, C.; Marreiros, G.; Santos, R.;and Freitas, C. F. 2010. Smart Offices and Intelligent Deci-sion Rooms. In Handbook of Ambient Intelligence and SmartEnvironments, ed. H. Nakashima, H. Aghajan, and J. C.Augusto, 851–880. Berlin: Springer. dx.doi.org/10.1007/978-0-387-93808-0_32

Romano, N. C., and Nunamaker, J. F. 2001. Meeting Analy-sis: Findings from Research and Practice. In Proceedings of the34th Annual Hawaii International Conference on System Sci-ences. Los Alamitos, CA: IEEE Computer Society.

Sharma, R.; Yeasin, M.; Krahnstoever, N.; Rauschert, I.; Cai,G.; Brewer, I.; MacEachren, A.; Sengupta, K. 2003. Speech-Gesture Driven Multimodal Interfaces for Crisis Manage-ment. Proceedings of the IEEE 91(9): 1327–1354. dx.doi.org/10.1109/JPROC.2003.817145

Shrobe, H.; Coen, M.; Wilson, K.; Weisman, L.; Thomas, K.;Groh, M.; Phillips, B.; Peters, S.; Warshawsky, N.; and Finin,P. 2001. The Intelligent Room. MIT AI Laboratory AFRL-IF-RS-TR-2001-168 Final Technical Report. Rome, New York:Air Force Research Laboratory.

Soltau, H.; Saon, G.; and Kingsbury, B. 2010. The IBM Atti-la Speech Recognition Toolkit. In 2010 IEEE Workshop onSpoken Language Technology, SLT 2010 — Proceedings, 97–102. Piscataway, NJ: Institute for Electrical and ElectronicsEngineers. dx.doi.org/10.1109/slt.2010.5700829

Stanford, V.; Garofolo, J.; Galibert, O.; Michel, M; andLaprun, C. 2003. The (NIST) Smart Space and Meeting RoomProjects: Signals, Acquisition Annotation, and Metrics. InProceedings of the IEEE International Conference on Acoustics,Speech, and Signal Processing (ICASSP ’03), 4, 6–10. Piscat-away, NJ: Institute for Electrical and Electronics Engineers.dx.doi.org/10.1109/icassp.2003.1202748

Voss, L. L., and Ehlen, P. 2007. The CALO Meeting Assistant.In Proceedings of Human Language Technologies: The AnnualConference of the North American Chapter of the Association forComputational Linguistics: Demonstrations. Stroudsberg, PA:Association for Computational Linguistics 2007.dx.doi.org/10.3115/1614164.1614173

Walsh, W. E.; Tesauro, G.; Kephart, J. O.; and Das, R. 2004.Utility Functions in Autonomic Systems. In Proceedings of the1st International Conference on Autonomic Computing (ICAC2004), 70–77. Piscataway, NJ: Institute for Electrical andElectronics Engineers. dx.doi.org/10.1109/ICAC.2004.1301349

Weiser, M., and Brown, J. S. 1996. Designing Calm Technol-ogy. PowerGrid Journal 1.01, 1–5.

Articles

92 AI MAGAZINE

Robert Farrell is a research staff member at the IBM T. J.Watson Research Center in Yorktown Heights, NY, USA. Hehas a long-term research interest in the cognitive processesof human learning, knowledge representation, reasoning,and language understanding. His past work includes cogni-tive models, intelligent tutoring systems, and social com-puting applications. He is currently working on software toextract knowledge from unstructured information sources.

Jonathan Lenchner is chief scientist at IBM Research-Africa. Previously he was one of the founders of the IBMCognitive Environments Lab in Yorktown Heights, NY. Hisresearch interests include computational geometry, robot-ics, AI, and game theory. His recent work includes researchon humanoid robots and development of an immersiveenvironment to help a professional sports team with tradesand draft picks.

Jeffrey Kephart is a distinguished research staff member atIBM T. J. Watson Research Center, and a Fellow of the IEEE.He is known for his work on computer virus epidemiologyand immune systems, self-managing computing systems,electronic commerce, and data center energy management.Presently, he serves as a principal investigator on a cognitivecomputing research project with a large energy companyand leads work on applying intelligent agent technologiesto corporate mergers and acquisitions.

Alan Webb is a senior software engineer at the IBM T. J.Watson Research Center. His present research interests arefocused upon applying the principles of distributed cogni-tion as an inspiration for pervasive cognitive environments.He is currently working on a generalized system architec-ture for the cognitive environment and development of themergers and acquisitions application.

Michael Muller is a research staff member in the CognitiveUser Experience group at IBM Research in Cambridge, MA.His research areas have included collaboration in healthcare, metrics and analytics for enterprise social software,participatory design, and organizational crowdfunding. Hiscurrent work focuses on employee experiences in the work-place.

Thomas Erickson is a social scientist and interactiondesigner at the IBM T. J. Watson Research Center. Hisresearch has to do with designing systems that enablegroups of people to interact coherently and productively inboth virtual and real environments.

David Melville is a research staff member at IBM T. J. Wat-son Research Center. His research interests include immer-sive data spaces, spatial computing, adaptive physical archi-tecture, and symbiotic experience design.

Rachel Bellamy is a principal research staff member andgroup manager at IBM T. J. Watson Research Center andheads the Research Design Center. Her general area ofresearch is human-computer interaction and her currentwork focuses on the user experience of symbiotic cognitivecomputing.

Daniel Gruen is a cognitive scientist in the Cognitive UserExperience group at IBM Research in Cambridge, MA. He isinterested in the design of systems that let strategic decisionmakers seamlessly incorporate insights from cognitive com-puting in their ongoing deliberations and creative thinking.He is currently working with a variety of companies to under-stand how such systems could enhance the work they do.

Jonathan Connell is a research staff member at IBM T. J.Watson Research Center. His research interests include com-puter vision, machine learning, natural language, robotics,and biometrics. He is currently working on a speech-drivenreactive reasoning system for multimodal instructional dia-logue.

Danny Soroker is a research staff member at IBM T. J. Wat-son Research Center. His research interests include intelli-gent computation, human-computer interaction, algo-rithms, visualization, and software design. He is currentlyworking on agents to support problem solving and decisionmaking for corporate mergers and acquisitions.

Andy Aaron is a research staff member at IBM T. J. WatsonResearch Center. He was on the speech team for the IBMWatson Jeopardy! match. Along with his work in speechsynthesis and speech recognition, he has done sound designfor feature films and produced and directed TV and films.

Shari Trewin is a research staff member at IBM T. J. WatsonResearch Center. Her current research interests include mul-timodal human-computer interaction and accessibility ofcomputer systems. She is currently working on knowledgeextraction from scientific literature and interaction designsfor professionals working with this extracted knowledge.

Maryam Ashoori is a design researcher at IBM T. J. WatsonResearch Center. She has a passion for exploring the inter-section between art and computer science. Her work hasresulted in several novel applications for the CognitiveEnvironments Laboratory, including a ”Zen Garden” forunwinding after a long day and a service for sparking cre-ativity for inventors.

Jason Ellis is a research staff member at IBM T. J. WatsonResearch Center. His research interests include social com-puting and usability. He is currently working on collabora-tive user interfaces for cognitive systems.

Brian Gaucher is senior manager of the Cognitive Envi-ronments Laboratory at IBM T. J. Watson Research Center.He leads teams specializing in user experience design andphysical infrastructure of cognitive computing environ-ments. His work focuses on the creation of highly interac-tive physical spaces designed to improve decision makingthrough always-on ambient intelligence.

Dario Gil is the vice president of science and technologyfor IBM Research. As director of the Symbiotic CognitiveSystems department, he brought together researchers inartificial intelligence, multiagent systems, robotics,machine vision, natural language processing, speech tech-nologies, human-computer interaction, social computing,user experience, and interaction design to create symbioticcognitive computing technology, services, and applicationsfor business.

Articles

FALL 2016 93