confucius: an intelligent multimedia storytelling interpretation and presentation system minhua...

18
CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Engineering University of Ulster, Magee

Upload: kristopher-little

Post on 25-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

CONFUCIUS:An Intelligent MultiMedia Storytelling

Interpretation and Presentation System

Minhua Eunice MaSupervisor: Prof. Paul Mc Kevitt

School of Computing and Intelligent SystemsFaculty of Engineering

University of Ulster, Magee

Page 2: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Outline

Related research Overview of CONFUCIUS Automatic generation of 3D animation Semantic representation Natural language processing Current state of implementation Relation to other work Conclusion & Future work

Page 3: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

3D visualisation Virtual humans & embodied agents: Jack, Improv, BEAT MultiModal interactive storytelling: AesopWorld, KidsRoom,

Larsen & Petersen’s Interactive Storytelling, computer games Automatic Text-to-Graphics Systems: WordsEye, CD-based

language animation

Related research in NLP Lexical semantics Levin’s verb classes Jackendoff’s Lexical Conceptual Structure Schank’s scripts

Related research

Page 4: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Objectives of CONFUCIUS

To interpret natural language sentences/stories and to extract conceptual semantics from the natural language

To generate 3D animation and virtual worlds automatically from natural language

To integrate 3D animation with speech and non-speech audio, to form an intelligent multimedia storytelling system

Story in natural language

CONFUCIUSMovie/drama script 3D animation

non-speech audioTailored menu for script input

Speech (dialogue)

Storywriter

/playwright

User/story listene

r

Page 5: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Architecture of CONFUCIUS

3D authoring tools, existing 3D

models & character models

visual knowledge (3D graphic library)

Prefabricated objects(knowledge base)

Script writer

Script parser

Natural Language Processing

Text To Speech

Sound effects

Animation generation

Synchronizing & fusion

3D world with audio in VRML

Natural language stories

Language knowledge

mapping

LCS lexicongrammar

semantic representations

visual knowledge

Page 6: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Software & Standards

Java parsing semantic representation changing VRML code to add/modify animation integrating modules

Natural language processing tools Connexor Machinese DFG parser (morphologic and syntax

parsing) WordNet (lexicon, semantic inference)

3D graphic modelling Existing 3D models (virtual human/object) on Internet Authoring tools

Humanoid characters: Character Studio Props & stage: 3D Studio Max Narrator: Microsoft Agent

Modelling language & standard VRML 97 for modelling geometry of objects, props, environment H-Anim specifications for humanoid modelling

Page 7: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Agents and Avatars—How much autonomy?

Autonomy & intelligence: highlow

autonomous agents

avatars interface agentsVirtual humans:

Autonomous agents have higher requirements for sensing, memory, reasoning, planning, behaviour control & emotion (sense-emotion-control-action structure) “User-controlled” avatars require fewer autonomous actions-- basic naïve physics such as collision detection and reaction still required Virtual character in non-interactive storytelling between agents and avatars--its behaviours, emotion, responses to changing environment described in story input

characters in non-interactive storytelling

Page 8: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Graphics library

Simple geometry filesgeometry & joint hierarchy

Files (H-Anim)

animation library(key frames)

objects/props characters

motions

instantiation

Page 9: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Level of Articulation (LOA) of H-Anim

Joints and segments of LOA1

CONFUCIUS adopts LOA1 in human animation animation engine adds ROUTEs dynamically

based on H-anim’s joints & animation keyframes CONFUCIUS’ human animation adapted for

other LOAs.

Example site nodes on hands

pushing objects

holding objects

Page 10: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Semantic representations

Categories Knowledge representations Decomposite Typical applications rule-based representation

expert systems

FOPC (First Order Predicate Calculus)

sentence representation, expert systems

semantic networks

lexical semantics

Schank’s scripts

story understanding

frame-based representations

general knowledge representation & reasoning

XML-based representations

multimodal semantics

Conceptual Dependency (CD)

event-logic truth conditions

x-schema and f-structure

Lexical-Conceptual Structure (LCS)

physical knowledge representation & reasoning (inc. spatial /temporal reasoning)

Lexical Visual Semantic Representation (LVSR)

dynamic vision (movement) recognition & generation

Page 11: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Lexical Visual Semantic Representation (LVSR): semantic representation between language syntax and 3D models

LVSR based on Jackendoff’s LCS adapted to task of language visualization (enhancement with Schank’s scripts)

Ontological categories: OBJ, HUMAN, EVENT, STATE, PLACE, PATH, PROPERTY

OBJ -- props/places (e.g. buildings) HUMAN -- human being/other articulated animated characters

(e.g. animals) as long as their skeleton hierarchy is defined EVENT -- actions, movements and manners STATE -- static existence PROPERTY -- attributes of OBJ/HUMAN

Lexical Visual Semantic Representation

Page 12: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

PATH & PLACE predicates

PATH predicates

Direction feature

Termination feature

PLACE predicates

contact/attach feature

to 1 1 at unmarked

from 0 1 behind <-contact>

toward 1 0 end_of n/a

away_from 0 0 in unmarked

via n/a 0 in_front_of <-contact>

across n/a n/a near <-contact>

along n/a n/a on <+contact>

out unmarked

over <-contact>

top_of n/a

under unmarked

interpret spatial movement of OBJ/HUMANs 62 common English prepositions 7 PATH predicates & 11 PLACE predicates

Page 13: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

NLP in CONFUCIUS

Coreference resolution

Part-of-speech tagger

Syntactic parser Morphological parser

Semantic inference

Pre-processing

Connexor FDG parser

WordNetLCS database

FEATURES

DisambiguationTemporal reasoning

Lexicaltemporal relations

Post-lexicaltemporal relations

Page 14: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Visual valency & verb ontology

2.2.1. Human action verbs 2.2.1.1. One visual valency (the role is a human, (partial) movement) 2.2.1.1.1. Biped kinematics: arm actions (wave, scratch), leg actions (walk, jump, kick), torso actions (bow), combined actions (climb) 2.2.1.1.2. Facial expressions & lip movement, e.g. laugh, fear, say, sing, order 2.2.1.2. Two visual valency (at least one role is human) 2.2.1.2.1. One human and one object (vt. or vi.+instrument) e.g. throw, push, kick, open, eat, drink, bake, trolley 2.2.1.2.2. Two humans, e.g. fight, chase, guide 2.2.1.3. Visual valency ≥ 3 (at least one role is human) 2.2.1.3.1. Two humans and one object (inc. ditransitive verbs), e.g. give, show 2.2.1.3.2. One human and 2+ objects (vt. + object + implicit instr./goal/theme) e.g. cut, write, butter, pocket, dig, cook 2.2.1.4. Verbs without distinct visualisation when out of context: verbs of trying, helping, letting, creating/destroying 2.2.1.5. High level behaviours (routine events), political and social activities

e.g. interview, eat out (go to restaurant), go shopping

Page 15: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Level-of-Detail (LOD)basic-level verbs & troponyms

EVENT

go

run

cause…

event level verbs

walk climb jump manner level verbs

limp stride swaggertrot

skip bounce hopjog romp troponym level verbs

Page 16: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Current status of implementation

Collision detection example (contact verbs: hit, collide, scratch, touch)The car collided with a wall.

using ParallelGraphics’ VRML extension--object-to-object collision non-speech sound effects

H-Anim examples:3 visual valency verbsJohn put a cup of coffee on the table.

H-Anim Site node locative tags of object (on_table tag for table object)

2 visual valency verbs John pushed the door.John ate the bread.Nancy sat on the chair.

1 visual valency verbsThe waiter came to me: “Can I help you? Sir.”

speech modality & lip synchronization camera direction (avatar’s point-of-view)

Page 17: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Relation to other work

Domain-independent general purpose humanoid character animation

CONFUCIUS’ character animation focuses on language-to-humanoid animation process rather than considering human modelling & motion solely

Implementable semantic representation LVSR connecting linguistic semantics to visual semantics & suitable for action execution (animation)

Categorization and visualisation of eventive verbs based on visual valency

Reusable common sense knowledge base to elicit implied actions, instruments, goals, themes underspecified in language input

Page 18: CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing

Faculty Research Student Conference

Jordanstown, 15 Jan 2004

Prospective applications Children’s education Multimedia presentation Movie/drama production Computer games Virtual Reality

Conclusion & Future work

Humanoid animation explores problems in language visualization & automatic animation production

Formalizes meaning of action verbs and spatial prepositions Maps language primitives with visual primitives Reusable common senses knowledge base for other systems

Further work Discourse level interpretation Action composition for

simultaneous activities Verbs concerning multiple

characters’ synchronization & coordination(e.g. introduce)