integration of machine learning, quantum networks and software-hardware methodology in humanoid...

Integration of Machine Learning, Quantum Integration of Machine Learning, Quantum Networks and software-hardware Networks and software-hardware methodology in humanoid robotsmethodology in humanoid robots

Interactive Robot Theatre as a future toy

Talk presented at Department of Electronics, Technical University of Warsaw, December 2004

Marek Perkowski, Dept. Electrical Engineering PSU, and

Department of Electronics and Computer Science,

Korea Advanced Institute of Science and Technology

Toys is a very serious businessToys is a very serious business

Talking RobotsTalking Robots• Many talking toys exist,

but they are still very primitive

• Actors for robot theatre, agents for advertisement, education and entertainment.

• Designing inexpensive natural size humanoid caricature and realistic robot heads

We concentrate on Machine Learning techniques used to teach robots behaviors, natural language dialogs and facial gestures.

Dog.com from Japan

Work in progress

Robot with a Personality?Robot with a Personality?• Future robots will interact

closely with non-sophisticated users, children and elderly, so the question arises, how they should look like?

• If human face for a robot, then what kind of a face?

• Handsome or average, realistic or simplified, normal size or enlarged?

• Why is Kismet so successful? •We believe that a robot that will interact with humans should have some kind of “personality” and Kismet so far is the only robot with “personality”.

•The famous example of a robot head is Kismet from MIT.

Robot face should be friendly and funny

The Muppets of Jim Henson are hard to match examples of puppet artistry and animation perfection.

We are interested in robot’s personality as expressed by its:– behavior,

– facial gestures,

– emotions,

– learned speech patterns.

Behavior, Dialog Behavior, Dialog and Learningand Learning

• Robot activity as a mapping of the sensed environment and internal states to behaviors and new internal states (emotions, energy levels, etc).

• Our goal is to uniformly integrate verbal and non-verbal robot behaviors.

Words communicate only about 35 % of the information transmitted from a sender to a receiver in a human-to-human communication.

The remaining information is included in para-language.

Emotions, thoughts, decision and intentions of a speaker can be recognized earlier than they are verbalized. NASA

Morita’s Theory Morita’s Theory

Fig. 1. Learning Behaviors as Mappings fromFig. 1. Learning Behaviors as Mappings fromenvironment’s features to interaction proceduresenvironment’s features to interaction procedures

AutomaticAutomaticsoftwaresoftwareconstructionconstructionfrom examplesfrom examples

(decision tree, (decision tree, bibi--decomposition,decomposition,AshenhurstAshenhurst, DNF), DNF)

Speech frommicrophones

Image featuresfrom cameras

Sonars and othersensors

Emotions andknowledge memory

probability Verbal responsegeneration (textresponse and TTS).Stored sounds

Headmovementsand facialemotionsgeneration

Neck and shouldersmovement generation

Robot Head Construction, 1999Robot Head Construction, 1999

Furby head with new controlFurby head with new control JonasJonas

We animate various kinds of humanoid heads with from 4 to 20 DOF, looking for comical and entertaining values.

Mister ButcherMister Butcher

4 degree of freedom neck

Latex skin from Hollywood

Robot Head Construction, 2000Robot Head Construction, 2000

SkeletonSkeleton Alien

We use inexpensive servos from Hitec and Futaba, plastic, playwood and aluminum.

The robots are either PC-interfaced, use simple micro-controllers such as Basic Stamp, or are radio controlled from a PC or by the user.

AdamAdamMarvin the Crazy RobotMarvin the Crazy Robot

Technical Construction, 2001 Technical Construction, 2001 DetailsDetails

Virginia WoolfVirginia Woolf

heads equipped with microphones, USB cameras, sonars heads equipped with microphones, USB cameras, sonars and CDS light sensorsand CDS light sensors

20012001

MaxMax

Image processing and pattern recognition uses software developed at PSU, CMU and Intel (public domain software available on WWW). Software is in Visual C++, Visual Basic, Lisp and Prolog.

BUG (Big Ugly Robot)BUG (Big Ugly Robot)

20022002

Visual Feedback and Learning based on Visual Feedback and Learning based on Constructive InductionConstructive Induction

20022002

Professor Perky Professor Perky

1 dollar latex skin 1 dollar latex skin from Chinafrom China

• We compared several commercial speech systems from Microsoft, Sensory and Fonix. •Based on experiences in highly noisy environments and with a variety of speakers, we selected Fonix for both ASR and TTS for Professor Perky and Maria robots.

• We use microphone array from Andrea Electronics.

Professor Perky with automated Professor Perky with automated speech recognition (ASR) and speech recognition (ASR) and text-to-speech (TTS) capabilitiestext-to-speech (TTS) capabilities

2002, Japan

Maria, Maria, 2002/20032002/2003

20 DOF

Construction Construction details of Mariadetails of Maria

location of location of controlling controlling rodsrods

location location of head of head servosservos

location location of remote of remote servosservosCustom

designed skin

skull

Animation of eyes and eyelidsAnimation of eyes and eyelids

Software/Hardware Architecture•Network- 10 processors, ultimately 100 processors.

•Robotics Processors. ACS 16

•Speech cards on Intel grant

•More cameras

•Tracking in all robots.

•Robotic languages – Alice and Cyc-like technologies.

Cynthia, 2004, June

Currently the hands

are not moveable.

We have a separate hand design project.

HAHOE KAIST ROBOT THEATRE, KOREA, SUMMER 2004

Sonbi, the Confucian Scholar Paekchong, the bad butcher

Yangban the

Aristocrat and Pune

his concubine

The Narrator

The Narrator

We base all our robots on inexpensive radio-controlled servo technology.

We are familiar with latex and polyester technologies for faces

New Silicone Skins

Probabilistic State Machines to describe Probabilistic State Machines to describe emotionsemotions

Happy state

Ironic state

Unhappy state

“you are beautiful”

/ ”Thanks for a compliment”

“you are blonde!”

/ ”I am not an idiot”

P=1

P=0.3

“you are blonde!”

/ Do you suggest I am an idiot?”

P=0.7

Facial Behaviors of MariaFacial Behaviors of Maria

Do I look like younger than twenty three?Maria asks:Maria asks:

“yes”

“no” “no”

0.30.7

Response:Response:

Maria smilesMaria smilesMaria frownsMaria frowns

Probabilistic Grammars for performancesProbabilistic Grammars for performances

Who?

What?

Where?

Speak ”Professor Perky”, blinks eyes twice

Speak “In the classroom”, shakes head

P=0.1

Speak “Was drinking wine”

P=0.1

P=0.3

P=0.5

Speak ”Professor Perky”

Speak ”Doctor Lee”

Speak “in some location”, smiles broadly

Speak “Was singing and dancing”

P=0.5 P=0.5

P=0.1 P=0.1

….

P=0.1

Human-controlled modes of Human-controlled modes of dialog/interactiondialog/interaction

Robot asks

Human teaches

Human commandsHuman asks

Robot performs

“Hello Maria”

“Thanks, I have a question”

“Thanks, I have a lesson”

“Thanks, I have a command”

“Lesson finished”

“Questioning finished”

“Command finished”

“Stop performance”

“Question”

Robot-Receptionist Initiated Robot-Receptionist Initiated ConversationConversation

Robot

What can I do for you?What can I do for you?

Human

Robot asksThis represents operation mode


Robot


Human

I would like to order a I would like to order a table for twotable for two

Robot asks


Robot

Smoking or non-Smoking or non-smoking?smoking?

Human

Robot asks


Robot

Smoking or non-Smoking or non-smoking?smoking?

Human

I do not understandI do not understand

Robot asks


Robot

Do you want a table in a Do you want a table in a smoking or non-smoking smoking or non-smoking section of the restaurant? section of the restaurant?

Non-smoking section is Non-smoking section is near the terrace.near the terrace.

Human

Robot asks


Robot

Do you want a table in a Do you want a table in a smoking or non-smoking smoking or non-smoking section of the restaurant? section of the restaurant?

Non-smoking section is Non-smoking section is near the terrace.near the terrace.

Human

A table near the terrace, please

Robot asks

Human-Initiated ConversationHuman-Initiated Conversation

Robot Human

Hello Maria

Robot asks

initialization

Human-Initiated ConversationHuman-Initiated Conversation

Robot Human

Hello MariaWhat can I do for you?

Robot asks

Human-AskingHuman-Asking

Robot Human

Question

Human asksQuestionRobot asks


Robot Human

Question

Human asks

Yes, you ask a question.


Robot Human

What book wrote Lee?

Human asks



Robot Human


Human asks

I have no sure information.


Robot Human

Try to guess.

Human asks



Robot Human

Try to guess.

Human asks

Lee wrote book “Flowers”.


Robot Human

This is not true.

Human asks

Lee wrote book “Flowers”.

Human-TeachingHuman-Teaching

Robot Human

Questioning finished

Human teaches“Questioning finished” Robot asks

Human asks

Thanks, I have a lesson

Human endsHuman endsquestioningquestioning


Robot Human

Questioning finished


Human asks


Robot enters Robot enters asking modeasking mode



Robot Human



Human asks


Human starts Human starts teachingteaching



Robot Human


Yes

Human teaches


Robot Human

I give you question-answer patternYes

Human teaches


Robot Human

Question pattern:

What book Smith wrote?Yes

Human teaches

Robot Human

Answer pattern:

Smith wrote book “Automata Theory”

Yes

Human teaches



Robot Human

Checking question:

What book wrote Smith?Yes

Human teaches


Robot Human

Checking question:

What book wrote Smith?Smith wrote book “Automata Theory”

Human teaches


Robot Human

I give you question-answer patternYes

Human teaches


Robot Human

Question pattern:

Where is room of Lee?Yes

Human teaches


Robot Human

Answer pattern:

Lee is in room 332Yes

Human teaches

Human-Checking what robot Human-Checking what robot learnedlearned

Robot Human

Lesson finished


Human teaches“Lesson finished”


Robot Human

Lesson finished



What can I do for you?


Robot Human

Question



What can I do for you?


Robot Human

Question





Robot Human


Human asks



Robot Human

Try to guess.

Human asks



Robot Human

Try to guess.

Human asks

Lee wrote book “Automata Theory”

Observe that robot found similarity between Smith and Lee and generalized (incorrectly)

Behavior, Dialog and LearningBehavior, Dialog and Learning

• The dialog/behavior has the following components: – (1) Eliza-like natural language dialogs based on pattern

matching and limited parsing. • Commercial products like Memoni, Dog.Com, Heart, Alice,

and Doctor all use this technology, very successfully – for instance Alice program won the 2001 Turing competition.

– This is a “conversational” part of the robot brain, based on pattern-matching, parsing and black-board principles.

– It is also a kind of “operating system” of the robot, which supervises other subroutines.

• (2) Subroutines with logical data base and natural language parsing (CHAT). – This is the logical part of the brain used to find connections between

places, timings and all kind of logical and relational reasonings, such as answering questions about Japanese geography.

• (3) Use of generalization and analogy in dialog on many levels. – Random and intentional linking of spoken language, sound effects and facial gestures.

– Use of Constructive Induction approach to help generalization, analogy reasoning and probabilistic generations in verbal and non-verbal dialog, like learning when to smile or turn the head off the partner.


• (4) Model of the robot, model of the user, scenario of the situation, history of the dialog, all used in the conversation.

• (5) Use of word spotting in speech recognition rather than single word or continuous speech recognition.

• (6) Continuous speech recognition (Microsoft)• (7) Avoidance of “I do not know”, “I do not

understand” answers from the robot. – Our robot will have always something to say, in the worst case,

over-generalized, with not valid analogies or even nonsensical and random.


- - -

00 01 11 10

00 - - -01 - - -11 - – 1,1,1,0 -10 -

ABABCDCD

0,0,0,3

-

Input Variables

A: 0=what, 1=where, B: 0=wrote, 1=is, C: 0=book, 1=room, D: 0=Smith, 1=Lee

0000=what wrote book Smith?

0111=what is room Lee?

1111=where is room Lee?

Example Answer = Smith wrotebook “Automata Theory”

Example Answer = Lee is room 332

New Question:

0001: What wrote book Lee?

Fig. 3. Question Answering by induction of answer parameters.

Output Variables

X: 0=Smith, 1=Lee, 2=Perkowski, Y: 0=wrote , 1=is, Z: 0=book, 1=room, 2=building, V: 0=332, 1=73, 2=245, 3=“Automata Theory”, 4=“Logic Design”

X,Y,Z,V

Recent Works

• Multi-brain: sub-brains communicate through natural language: – Devil, angel and myself.– Egoist and moralist

• CAM – Contents Addressable Memory. Cypress funded project in 2005.

C - right light sensorC - right light sensor

D - left microphoneD - left microphone

A - rightA - rightmicrophonemicrophoneB - left light sensorB - left light sensor00 01 11 10

00 - 1,0 -01 2,0 1,0 1,111 - – 0,0 -10 - 0,0 - -

ABAB

CDCD

-0,0

Head_Horiz , Eye_Blink

Robot turnshead right,away fromlight in left

Robot turns head leftwith equal front lightingand no sound.

It blinks eyes

Robot doesnothing

Robot turns headleft, away from lightin right, towardssound in left

Fig. 2. Seven examples (4-input, 2 output minterms) aregiven by the teacher as correct robot behaviors

Generalization of Generalization of the Ashenhurst-the Ashenhurst-

Curtis Curtis decomposition decomposition

modelmodel

This kind of tables known from This kind of tables known from Rough Sets, Decision Trees, etc Rough Sets, Decision Trees, etc Data MiningData Mining

Decomposition is hierarchicalAt every step many decompositions exist

Constructive Induction: Constructive Induction: Technical DetailsTechnical Details

• U. Wong and M. Perkowski, A New Approach to Robot’s Imitation of Behaviors by Decomposition of Multiple-Valued Relations, Proc. 5th Intern. Workshop on Boolean Problems, Freiberg, Germany, Sept. 19-20, 2002, pp. 265-270.

• A. Mishchenko, B. Steinbach and M. Perkowski, An Algorithm for Bi-Decomposition of Logic Functions, Proc. DAC 2001, June 18-22, Las Vegas, pp. 103-108.

• A. Mishchenko, B. Steinbach and M. Perkowski, Bi-Decomposition of Multi-Valued Relations, Proc. 10th IWLS, pp. 35-40, Granlibakken, CA, June 12-15, 2001. IEEE Computer Society and ACM SIGDA.

• Decision Trees, Ashenhurst/Curtis hierarchical decomposition and Bi-Decomposition algorithms are used in our software

• These methods create our subset of MVSIS system developed under Prof. Robert Brayton at University of California at Berkeley [2].– The entire MVSIS system can be also used.

• The system generates robot’s behaviors (C program codes) from examples given by the users.

• This method is used for embedded system design, but we use it specifically for robot interaction.

Constructive InductionConstructive Induction

Braitenberg VehiclesBraitenberg Vehicles

Example 1: Simulation

Quantum Circuits

|0

|1

|x

|0

|1

|x

|0

|1

|xV V† V

=

U

|0

|1

V|x

|0

|1

|0

|1

|x

|0

|1

|0

|1

|x

?

Toffoli gate: Universal, uses controlled square root of NOT

Quantum Portland FacesQuantum Portland Faces

Conclusion. What did we learnConclusion. What did we learn

• (1) the more degrees of freedom the better the animation realism.

• (2) synchronization of spoken text and head (especially jaw) movements are important but difficult.

• (3) gestures and speech intonation of the head should be slightly exaggerated.

Conclusion. What did we learn(cont)Conclusion. What did we learn(cont)

• (4) the sound should be laud to cover noises coming from motors and gears and for a better theatrical effect.

• (5) noise of servos can be also reduced by appropriate animation and synchronization.

• (6) best available ATR and TTS packages should be applied.

• (7) OpenCV from Intel is excellent.• (8) use puppet theatre experiences.

• (9) because of a too slow learning, improved parameterized learning methods will be developed, but also based on constructive induction.

• (10) open question: funny versus beautiful.• (11) either high quality voice recognition from headset or low

quality in noisy room. YOU CANNOT HAVE BOTH WITH CURRENT ATR TOOLS.

• The bi-decomposer of relations and other useful software used in this project can be downloaded from http://www-cad.eecs.berkeley.edu/mvsis/.

Conclusion. What did we learn(cont)Conclusion. What did we learn(cont)

• This is the most advanced This is the most advanced humanoid robot theatre robot humanoid robot theatre robot project outside of Japanproject outside of Japan

• Open to international Open to international collaborationcollaboration

What to emphasize in future What to emphasize in future cooperation?cooperation?

• We want to develop a general methodology for prototyping software/hardware systems for interactive robots that work in human environment.

• Image processing, voice recognition, speech synthesis, expressing emotions, recognizing human emotions.

• Machine Learning technologies. • Safety, not hitting humans.

Can we do Can we do this in this in

Poland? Poland?

Yes, engineers from Technical University of Gliwice produce already a

commercially available hexapod

International Intel Science Talent Competition and PDXBOT 2004

Additional Slides with Background

Robot Toy Market - Robosapiens

toy, poses in front of toy, poses in front of toy, poses in front of

GlobalizationGlobalization• Globalization implies that images,

technologies and messages are everywhere, but at the same time disconnected from a particular social structure or context. (Alain Touraine)

• The need of a constantly expanding market for its products chases the bourgoise over the whole surface of the globe. It must nestle everywhere, settle everywhere, establish connections everywhere. (Marx & Engels, 1848)

India and China - what’s different?

• They started at the same level of wealth and exports in 1980

• China today exports $ 184 Bn vs $ 34 Bn for India

• China’s export industry employs today over 50 million people (vs 2 m s/w in 2008, and 20 m in the entire organized sector in India today!)

• China’s export industry consists of toys (> 60% of the (> 60% of the world marketworld market), bicycles (10 m to the US alone last year), and textiles (a vision of having a share of > 50% of the world market by 2008)

Learning from Korea and Singapore Learning from Korea and Singapore

• The importance of Learning– To manufacture efficiently– To open the door to foreign technology and

investment– To have sufficient pride in ones own ability to open

the door and go out and build ones own proprietary identity

• To invest in fundamentals like Education• to have the right cultural prerequisites for catching up

• To have pragmatism rule, not ideology

Samsung

1979 Started making microwaves

1980 First export order (foreign brand)

1983 OEM contracts with General Electric

1985 All GE microwaves made by Samsung

1987 All GE microwaves designed by Samsung

1990 The world’s largest microwave manufacturer - without its own brand

1990 Launch own brand outside Korea

2000 Samsung microwaves # 1 worldwide, twelve factories in twelve countries (including India, China and the US)

2003 – the largest electronics company in the world

How did Samsung do it?

• By learning from GE and other buyers• By working very hard - 70 hour weeks, 10 days

holiday • By being very productive - 9 microwaves per

person per day vs 4 at GE• By meeting every delivery on time, even if it

meant working 7-day weeks for six months• By developing new models so well that it got

GE to stop developing their own

Ashenhurst Functional DecompositionAshenhurst Functional DecompositionEvaluates the data function and attempts to

decompose into simpler functions.

if A B = , it is disjoint decomposition

if A B , it is non-disjoint decomposition

B - bound set

A - free set

F(X) = H( G(B), A ), X = A F(X) = H( G(B), A ), X = A B B

X

A Standard Map of A Standard Map of function ‘z’function ‘z’

Bound Set

Fre

e S

et

a b \ c

z

Columns 0 and 1and

columns 0 and 2are compatible

column compatibility = 2

Explain the concept of Explain the concept of generalized don’t caresgeneralized don’t cares

NEW Decomposition of Multi-Valued NEW Decomposition of Multi-Valued RelationsRelations

if A B = , it is disjoint decomposition

if A B , it is non-disjoint decomposition

F(X) = H( G(B), A ), X = A B

Relation Rel

atio

n

Rel

atio

n

A

B

X

Forming a CCG from a K-MapForming a CCG from a K-Map

z

Bound Set

Fre

e S

et

a b \ cColumns 0 and 1 and columns 0 and 2 are compatiblecolumn compatibility index = 2

C1

C2

C0

Column Compatibility

Graph

Forming a CIG from a K-MapForming a CIG from a K-MapColumns 1 and 2 are incompatiblechromatic number = 2

z

a b \ c

C1

C2

C0

Column Incompatibility Graph

• A unified internal language is used to describe behaviors in which text generation and facial gestures are unified.

• This language is for learned behaviors.

• Expressions (programs) in this language are either created by humans or induced automatically from examples given by trainers.

Constructive InductionConstructive Induction

Is it worthy to build humanoid robots?

• Man’s design versus robot’s design• The humanoid robot is versatile and adaptive, it takes its form

from a human, a design well-verified by Nature.• Complete isomorphism of a humanoid robot with a human is

very difficult to achieve (walking) and not even not entirely desired.

• All what we need is to adapt the robot maximally to the needs of humans – elderly, disabled, children, entertainment.

• Replicating human motor or sensor functionality are based on mechanistic methodologies, but adaptations and upgrades are possible – for instance brain wave control or wheels

• Is it a cheating?

Is it worthy to build humanoid robots?

• Can building a mechanistic digital synthetic version of man be anything less than a cheat when man is not mechanistic, digital nor synthetic?

• If reference for the “ultimate” robot is man, then there is little confusion about one’s aim to replace man with a machine.

Man & Machine

• Main reason to build machines in our likeness is to facilitate their integration in our social space: – SOCIAL ROBOTICS

• Robot should do many things that we do, like climbing stairs, but not necessarily in the way we do it – airplane and bird analogy.

• Humanoid robots/social robots should make our life easier.

The Social Robot

• “developing a brain”: – Cognitive abilities as developed from classical AI to modern

cognitive ideas (neural networks, multi-agent systems, genetic algorithms…)

• “giving the brain a body”: – Physical embodiment, as indicated by Brooks [Bro86], Steels

[Ste94], etc.

• “a world of bodies”: – Social embodiment

• A Social Robot is:– A physical entity embodied in a complex, dynamic, and social

environment sufficiently empowered to behave in a manner conducive to its own goals and those of its community.

Anthropomorphism

• Social interaction involves an adaptation on both sides to rationalise each others actions, and the interpretation of the others actions based on one’s references

• Projective Intelligence: the observer ascribes a degree of “intelligence” to the system through their rationalisation of its actions

Anthropomorphism & The Social Robot

• Objectives– Augment human-robot sociality– Understand and rationalize robot behavior

• Embrace anthropomorphism

• BUT - How does the robot not become trapped by behavioral expectations?

• REQUIRED: A balance between anthropomorphic features and behaviors leading to the robot’s own identity

Finding the Balance• Movement

– Behavior (afraid of the light)– Facial Action Coding System

• Form– Physical construction– Degrees of freedom

• Interaction– Communication (robot-like vs. human voice)– Social cues/timing

• Autonomy• Function & role

– machine vs. human capabilities

Emotion Robots Experiments

• Autonomous mobile robots• Emotion through motion• “Projective emotion”• Anthropomorphism• Social behaviors

• Qualitative and quantitative analysis to a wide audience through online web-based experiments

The perception learning tasks

• Robot Vision:Robot Vision:1. Where is a face? (Face detection)

2. Who is this person (Face recognition, learning with supervisor, person’s name is given in the process.

3. Age and gender of the person.

4. Hand gestures.

5. Emotions expressed as facial gestures (smile, eye movements, etc)

6. Objects hold by the person

7. Lips reading for speech recognition.

8. Body language.

The perception learning tasks

• Speech recognition:Speech recognition:1. Who is this person (voice based speaker

recognition, learning with supervisor, person’s name is given in the process.)

2. Isolated words recognition for word spotting.

3. Sentence recognition.

• Sensors.Sensors.1. Temperature

2. Touch

3. movement

The behavior learning tasks

• Facial and upper body gestures:Facial and upper body gestures:1. Face/neck gesticulation for interactive dialog.

2. Face/neck gesticulation for theatre plays.

3. Face/neck gesticulation for singing/dancing.

• Hand gestures and manipulation.Hand gestures and manipulation.1. Hand gesticulation for interactive dialog.

2. Hand gesticulation for theatre plays.

3. Hand gesticulation for singing/dancing.

Learning the perception/behavior mappings

1. Tracking the human.

2. Full gesticulation as a response to human behavior in dialogs and dancing/singing.

3. Modification of semi-autonomous behaviors such as breathing, eye blinking, mechanical hand withdrawals, speech acts as response to person’s behaviors.

4. Playing games with humans.

5. Body contact with human such as safe gesticulation close to human and hand shaking.

integration of machine learning, quantum networks and software-hardware methodology in humanoid...

Documents

robot face

hollywood slide

technology slide

progress slide

nasa slide

robot head construction

realistic robot heads

learning robot activity