two-dimensional visual language grammarmmi.tudelft.nl/pub/siska/tsd 2dvislanggrammar.pdf · isting...

Two-Dimensional Visual Language Grammar ?

Siska Fitrianie and Leon J.M. Rothkrantz

Man-Machine-Interaction Group, Delft University of TechnologyMekelweg 4 2628CD Delft, The Netherlands

{s.fitrianie, l.j.m.rothkrantz}@ewi.tudelft.nl

Abstract. Visual language refers to the idea that communication oc-curs through visual symbols, as opposed to verbal symbols or words.Contrast to a sentence construction in spoken language with a linearordering of words, a visual language has a simultaneous structure witha parallel temporal and spatial configuration. Inspired by Deikto [5], wepropose a two-dimensional string or sentence construction of visual ex-pressions, i.e. spatial arrangements of symbols, which represent concepts.A proof of concept communication interface has been developed, whichenables users to create visual messages to represent concepts or ideasin their mind. By the employment of ontology, the interface constructsboth the syntax and semantics of a 2D visual string using a LexicalizedTree Adjoining Grammar (LTAG) into (natural language) text. This ap-proach captures elegantly the interaction between pragmatic and syntac-tic descriptions in a 2D sentence, and the inferential interactions betweenmultiple possible meanings generated by the sentence. From our user testresults, we conclude that our developed visual language interface couldserve as a communication mediator.

1 Introduction

Skills at interpreting visual symbols play an important part in humans’ learn-ing about the world and understanding of language. Words are also composedby symbols, of course. There are nonverbal symbols that can provide essen-tial meanings with their succinct and eloquent illustrations. Humans respondto these symbols as messages, though often without realizing exactly what it isthat has caused us to reach a certain conclusion. Such symbols are often visual,though they can be auditory or even tactile. The research described in this paperconcentrates on visual nonverbal symbols. In particulary, it focuses on exploringsuch symbols to represent concepts, i.e. objects, actions, or relations.

According to [15], human communication involves the use of concepts to rep-resent internal models of humans themselves, the outside world and of thingswith which the humans are interacting. In earlier work, we have investigated alanguange independent communication using visual symbols, i.e. icons [8], based

? The research reported here is part of the Interactive Collaborative Information Sys-tems (ICIS) project, supported by the Dutch Ministry of Economic Affairs, grantnr: BSIK03024

on signs and symbols, which are understood universally [15]. As a proof of con-cept, an iconic interface has been developed in a specific domain. It is appliedon a communication interface on PDAs in crisis situations. In such situations,wired communication could be disabled by the breakdown of the infrastruc-ture or information overload and speech communication is difficult due to noisyenvironments. Visual language has been used successfully in human-computerinteraction, visual programming, and human-human communication. Words areeasily forgotten, but images stay in our minds persistently [6]. Therefore, iconscan evoke a readiness to respond for a fast exchange of information and a fastaction as a result [14]. Furthermore, a direct manipulation on the icons allowsus to have a faster interaction [11].

A visual (communication) language uses a set of spatial arrangements ofvisual symbols with a semantic interpretation that is used in carrying out com-munication [3]. This language is based upon a vocabulary of visual symbolswhere each symbol represents one or more meaning, which are created accord-ing to the metaphors appropriate for the context of this type of languages. Thesentence structure of a visual language is different from a sentence in spokenlanguage [12]. The spoken language is composed by a linear ordering of words,while a visual language has a simultaneous structure with a parallel temporaland spatial configuration, e.g. the sign language syntax for deaf people [1], comicillustrations [4], diagrams. Based on this, we propose a two-dimensional syntaxstructure that enables a visual language sentence constructed in a 2D way.

2 Related Work

The approach described in this paper basically is inspired by Deikto, a game’sinteraction language [5]. The creator aimed at producing the smallest languagecapable of expressing the concepts used in the story world. A sentence in Deiktois represented by a connected acyclic graph, where a predecessor element explainsits successor. A graph element can be a lexicon or another clause. To help theplayers, the game provides hints of what lexicon(s) can be selected based on itsword class. Deikto follows a rigid grammar by assigning each verb the parts of thesentence in its dictionary definition. This revolving verb approach is supportedby Fillmores case grammar [7] and Schanks conceptual dependency theory [17].In case grammar, a sentence in its basic structure consists of a verb and one ormore noun phrases. Each phrase is associated with the verb in a particular caserelationship. The conceptual dependency defines the interrelationship of a setof primitive acts to represent a verb. The approach employs rules and a set ofprimitive acts, e.g. ATRANS: transfer possession of an object. Instead of usingthe word class (e.g. subject, verb, object), both approaches use thematic roles,e.g. agent, patient, instrument, etc, to define a sentence structure.

VIL, a one-dimensional visual language application [13], is designed to allowpeople to communicate with each other by constructing sentences solely relyingon icons. The system is based on the notion of simplified speech by reducing a

significant complexity. It is also inspired by the case concept of Fillmore and theverb classification of Schank.

In the following sections we will give an overview of the 2D visual languagewe propose. Further, we will concentrate on the visual sentence constructionand conversion to text or speech as well as a proof of concept visual languageinterface we have developed. Finally, we also present our test results.

3 Two-Dimensional Visual Language

According to [1], the syntax analysis of the visual language does not reduce toclassical spoken sentence syntax. There exists a set of ”topic” and ”comment” re-lations, in which a comment explains a topic. Therefore, in our two-dimensionalsyntax structure, a sentence is constructed in an acyclic-graph of visual sym-bols, i.e. icons (see fig. 1(b)). Conventional textual syntax structures are notconsidered 2D, since the parser processes them as 1D streams of symbols.

Fig. 1. Examples of visual language sentences: (a) hints for the verb ”drive” and (b) asimple sentence: ”Two paramedics drove an ambulance to the hospital at 15.00” and(c) a compound sentence: ”The firefighter informs the police that he will search fivevictims in the burning building”

A visual symbol can be connected by an arrow to another symbol, in whichthe former explains the latter symbol. Each symbol represents concepts or ideas.The sentence may be constructed from any part, however as soon as a verb isselected, the structure of the sentence will be determined. We define a case forevery symbol in our vocabulary. For example: the symbol ”paramedic” containsnumber, location, status, name, as attributes of the case. Particulary for symbolsthat represent verbs, based on the theory of [7][17], we define a case of every verband follow the frame syntactic analysis used for generating the VerbNet [10]. Forexample: the case of the verb ”drive” contains agent, theme, location, time.

To help the user, attributes of each selected visual symbol’s case on a 2Dsentence are displayed (see fig. 1(a)). As the icon is deselected, the hint willdisappear to reduce the complexity. A hint symbol can be selected and replacedby a visual symbol that is grammatically correct to form a sentence. The ap-proach gives a freedom to users to fill in the parts of a sentence, but at the same

time the system can restrict the choices of symbols which lead to a meaningfulsentence. A user may attach symbols that are not given by the hint by insertinga new node on a specific node on the sentence. This means that the new nodeexplains the selected node. Our developed grammar allows a compound sentenceconstruction, which can be done by inserting another verb on a verb node of thesentence (see fig. 1(c)) or noun phrase conjunction.

4 Knowledge Representation

Visual symbols have no meaning outside of their context. A visual symbol canbe interpreted by its perceivable form (syntax), by the relation between its formand what it means (semantics), and by its use (pragmatics) [2]. The relationbetween visual symbols and words can have an ambiguous meaning. We tackledthis by creating a verbal context that can link both visual and verbal thoughtstogether to form a symbol that can be remembered and recalled. An ontology isemployed to represent context that binds verbal and visual symbols together.

Fig. 2. Schematic vie of the two components of a verb lexical definition: semantic typesand linking to syntactic arguments

We employ the ontology to store information of the vocabulary, i.e. visualsymbols that represent nouns, pronouns, proper-nouns, adjectives, and adverbs,and their properties (i.e. attributes of a case). The ontology provides a naturalway to group its elements based on their concepts and provides the systeminformation about their semantic description, e.g. symbols of ”firefighter” and”paramedic” are grouped under ”animate” and a symbol of ”ambulance” is under”vehicle”. In particularly for verb symbols, based on [10], a lexeme has one ormore sense definitions, which consist of a semantic type with associated thematicroles and semantic features, and a link between the thematic roles and syntacticarguments. The definition also defines required and optional roles. Figure 2 showsa case for the verb ”drive”.

5 Lexicalized Tree Adjoining Grammar by VisualSymbols

Each visual symbol provides only a portion of the semantics of a visual languagesentence. The meaning of a 2D sentence with more than just one symbol can still

be represented by the same set of symbols, but it turns out to be very difficult todetermine the sentence meaning. The syntax of interactions between concepts(that are represented by visual symbols) enriches progressively the semanticof the sentence. The only thing that can be automatically derived from thesemantics of the symbols in a visual language sentence is a fixed word or phrasebelonging to these symbols.

Based on [16], we assign our vocabulary with Lexicalized Tree AdjoiningGrammar (LTAG) [9]. Figure 3(a) shows the example of our TAG trees vocabu-lary. For this purpose, we exploit XTAG grammar [18] that presents a large ex-isting grammar for English verbs. Mapping our VerbNet-based syntactic framesto the XTAG trees greatly increases the robustness of the conversion of 2D visuallanguage sentences to natural language text/speech.

Fig. 3. Conversion to natural language text: (a) examples of the iconized TAG ele-mentary trees, (b) example of a 2D sentence: ”Two paramedics drove an ambulanceto the hospital”, (c) example of mapping the thematic roles to the basic syntactic treedefined by the case of the verb ”drive”, and (d) example of a parse tree as the resultsof mapping the basic syntactic tree to the TAG trees

Based on the case of every symbol in a 2D sentence, a parser processesa 2D stream of symbols and maps the thematic roles of them into the basicsyntactic tree on the VerbNet-based vocabulary. Presumably, transformation ofVerbNet’s syntactic frames are recoverable by mapping the 2D sentence ontoelementary trees of TAG tree families. For this purpose, the parser exploitsthe system’s ontology to have the syntactic argument of every symbol in thesentence. Figure 3 shows an example of parsing a 2D sentence using LTAG. Wespecify the semantics of a 2D sentence in two ways. First, our developed ontology

offers a simple syntax-semantics interface for every symbol. As shown in fig. 2,each verb case has restricted the choice of symbols to form the sentence, i.e. byassociating thematic roles to semantics features. The meaning of a TAG treeis just the conjunction of the meanings of the elementary trees used to deriveit, once appropriate case elements are filled in. Finally, the VerbNet structureprovides an explicitly constructed verb lexicon with syntax and semantics. Bythis way, the syntax analysis and natural language construction can be donesimultaneously.

6 Reporting Crisis Situations

Our visual language communication tool was designed for reporting observationsin a crisis situation. A user can arrange an acyclic graph of visual symbols, i.e.icons, as a realization of his/her concepts or ideas. Besides supporting a fastinteraction by converting the message into natural language, the tool also canbe combined with any language application, e.g. a text to speech synthesizer,a language translator, etc, through a socket network. The current prototypeprovides a speech synthesizer to read aloud the resulted natural language textwith correct pronunciations. Figure 4 shows the architecture of our developedtool.

Fig. 4. The architecture of our visual language communication tool on a PDA

Figure 5 shows the interface of our visual language communication tool. Onthe interface, visual symbols are grouped in clusters based on their concept. Theinterface provides a next-symbol predictor to help users to find their intendedsymbols fast. It predicts which symbols are most likely to follow a given segmentof a visual symbol graph based on its syntax structure. When a user selectsone of the suggestions, it is automatically inserted into the graph to replacea selected hint symbol. The probability of the prediction of a visual symbolis estimated with n-grams language modelling. To compute the multi-gramsmodel, our tool collects the data during the interaction. The interface providesa real time distinctive appearance of which visual symbols can be selected nextaccording to syntactical rules. To construct a message, a user can select visualsymbols from the menu or from the prediction window. If the user changes theinput graph, the resulted text will be refreshed.

Fig. 5. The interface of our visual language communication tool on a PDA

7 Evaluation

We performed a similar user test as reported in [8]. It aimed to assess whether ornot users were still capable to express their concepts in mind using the providedvisual symbols in a 2D way. The test also addressed the usability issues on in-teracting with 2D sentence constructions on the visual language interface. Eightpeople took part in the test. The tasks were created using images of real crisissituations. The participants were asked to report what they might experience,by creating 2D visual language sentences on the interface. While performing thetask, they were asked to think aloud. There were no incorrect answers, exceptif any task was not performed at all. All activities were recorded and logged foranalyses purposes.

The experimental results showed that our target users were able to compose2D visual language messages to express their concepts and ideas in mind. Theusers used some time to find another concept when they could not find a relevantconcept from the provided visual symbols to represent their message. Althoughadaptation time was needed to recognize some icons, the results also indicatedthat the hints given while creating 2D-icon string helped the user to compose acomplete report.

8 Conclusion

A two-dimensional visual language grammar, as the continuation of our research[8], has been developed. The idea is inspired by a game’s language interaction,Deikto [5]. A sentence can be created using a spatial arrangement of visual sym-bols, i.e. icons. To support a natural visual language sentence construction, thearrangement may not be in a linear order. We combine LTAG syntax and Verb-Net frames so that we can analyze the syntax and semantics of visual sentences

and convert them to text/speech simultaneously and easier. The approach nat-urally and elegantly captures the interaction between the graph structure of thevisual sentences and the tree strucuture of LTAG syntax, and the inferentialinteractions between multiple possible meaning generated by the sentences.

An experimental visual language interface has been developed that is appliedfor reporting observations in a crisis situation. Our target users could expresstheir concepts and ideas solely using a spatial arrangement of visual symbols.However, future work should be done to gather data about how people mightcreate visual messages in their real life and how they experience this. Therefore,the 2D visual language grammar can cover all possible message constructions.

References

1. Bevelier D., Corina D.P. and Neville H.J.: Brain and Language: a Perspective fromSign Language. Neuron, Cell Press. 21 (1998) 275–278.

2. Chandler D.: Semiotics, the Basic. Routledge (2001).3. Chang S.K., Polese G., Orefice S., and Tucci M.: A Methodology and Interactive

Environment for Iconic Language Design. International Journal of Human ComputerStudies. 41 (1994) 683–716.

4. Cohn N.: Visual Syntactic Structures, Towards a Generative Grammar of VisualLanguage. http://www.emaki.net/essays/ (2003).

5. Crawford C.: Erasmatron. http://www.erasmatazz.com (2005).6. Frutiger A.: Sign and Symbols, Their Design and Meaning. New York: van Nostrand

Reinholt (1989).7. Fillmore C.J.: The Case for Case. in Universals in Linguistic Theory, Ed. by Emmon

Bach and Robert Harms. New York: Holt, Rinehart and Winston (1968).8. Fitrianie S. and Rothkrantz L.J.M.: Language-Independent Communication using

Icons on a PDA. Text, Speech and Dialogues. Lecture Notes in Artificial Intelligence.Springer. 3658 (2005) 404–411.

9. Joshi A.K., Levy L., and Takahashi M.: Tree Adjunct Grammars. Journal of theComputer and System Sciences. 10 (1975) 136–163.

10. Kipper-Schuler K.: VerbNet: A Broad-Coverage, Comprehensive Verb Lexicon.Ph.D. thesis proposal, University of Pennsylvania (2003).

11. Kjeldskov J. and Kolbe N.: Interaction Design for Handheld Computers. In Proc.of APCHI02. Science Press. China. (2002).

12. Lester P.M.: Visual Communication: Images with Messages, 4th Ed.Wadsworth/Thompson, Belmont, CA (2006).

13. Leemans N.E.M.P.: VIL, A Visual Inter Lingua. Doctoral Dissertation, WorcesterPoly-technic Institute, USA (2001).

14. Littlejohn S.W.: Theories of Human Communication, 5th Ed. Wadsworth (1996).15. Perlovsky L.I.: Emotions, Learning and Control. In Proc. of Intl. Symposium:

Intelligent Control, Intelligent System and Semiotics (1999).16. Ryant N. and Kipper K.: Assigning Xtag Trees to VerbNet. 7th International Work-

shop on Tree Adjoining Grammar and Related Formalisms (TAG+7). Vancouver,Canada (2004).

17. Schank R.: Computer Models of Thought and Language. W. H. Freeman. SanFrancisco (1973).

18. XTAG Research Group: A Lexicalized Tree Adjoining Grammar for English. Tech-nical Report IRCS-01-03. IRCS, University of Pennsylvania (2001).

two-dimensional visual language grammarmmi.tudelft.nl/pub/siska/tsd 2dvislanggrammar.pdf · isting...

Documents