experiments in videoconferencing · 6/14/2005 · why is videoconferencing not ubiquitous...
TRANSCRIPT
Experiments in Videoconferencing
Milton ChenCTO
http://vseelab.com
The VSee Auditorium
desktop interface
15’ x 5’ video wall
VSee
2nd place Stanford-Berkeley Innovator’s award3rd place Stanford business plan competition
Intel CEO Paul Otellini keynoteOracle Executive VP Chuck Rozwat keynote
Chuck Rozwat keynote
“the breakthrough that collaboration gurus have been hunting for” -
Jack HirschVP of TechnologyShell
“the world’s best videoconferencing system” - Cdr. Eric Rasmussen
Iraq Humanitarian Operations Center Department of Defense
“uniquely suited for planetwalk” -
John FrancisGoodwill AmbassadorUnited Nations
What if there is no network infrastructure ?
Office of Secretary of Defense, State Department, NATO, United Nations …
Strong AngelKona, Hawaii
17-22 July 2004
VSee was selected as the real-time communication system
VSee at Strong Angel
Provide global communication from a temporary shelter
VIP presentation between Kona and DC
Ad-hoc peer-to-peer WiFi
~ 0.5 mile ~1 - 10 mile
Experiment 1: convoy protection
VSee hops from car to car
Can also airdrop arbitrary data
setupscreen shot
Experiment 2: air-to-surface
Experiment 3: ocean search and rescue
The bottom video was from the live underwater camera held by the swimmer. The map with GPS annotation was shared using VSee
setupscreen shot
Experiment 3: ocean search and rescue
Experiment 3: ocean search and rescue
no pre-existing infrastructure
VSee leverages what you have– Internet– Internet2– Satellite– WiMax– Cell phone
VSee ad-hoc peer-to-peer WiFi– Laptop + wireless card is all you need
Afghanistan
Visual fidelity comparable to high-end hardwareSecure (FIPS 140-2 and triple 256 bit AES)Never crash (59-day challenge)Trivial to use (less than 60 seconds for 1st time users)
KabulNov 2004
March 2005From VSee deployment team
VSee for tsunami relief
UN headquarters in Jakarta
VSee in Darfur for refugee management
CARE International field officeSudan, Africa
but
Why is videoconferencing not ubiquitous
World’s first videoconferencing system
75 years later– Technology limitations
– Inadequate visual communication science
April 7, 1927 - Bell Labs3x2 inch black&white display1 msec end-to-end latency
VSeePeer-to-peer wireless
How well can we judge eye contact
“The heart is stirred more slowly by the ear than by the eye.”– Horace
Eye contact stirs us to action
[Sharbat Gula, photographed by McCurry ‘83]
Eye contact fires up our brain
[Kampe et al. ’01 Nature]
Eye contact sensitivity is high
Spatial perception taskAs good as Snellen acuity
[Gibson and Pick ’63]
2 m
0 8.5-8.50
100stdev = 2.8°
Eye
con
tact
(%)
Angle (deg)
* 6 observers judged 1 looker
looker observer
Sensitivity is symmetricCline ’67Kruger and Huckstedt ‘69Anstis, et al. ’69Stokes ’69 Ellgring ’70
PicturePhonecamera above display
Hydracamera below display
Eye contact is difficult
Looking into the camera Attempting eye contact
Solutions to eye contact
Half-silvered mirror [Rosenthal ’47] MAJIC [Okada, et al. ’94]
ClearBoard [Ishii, et al. ’92] GazeMaster [Gemmell, et al. ’00]
Methodology
Observers watch videos of looker
Large display with camera at the center
Eye contact?
Sensitivity is asymmetric
* 16 observers judged recorded videos of 1 looker
An anatomical explanation
looking at you looking sideways
looking up
looking down eye closing
Illustrations from The Artist’s Guide to Facial Expression[Faigin ’90]
VSeeEye contact
How well can we judge lip sync
“We shape our tools, and there after our tools shape us” - Marshal Mcluhan
Why read lips
Improves comprehension – Background noise [Sumby and Pollack ’54]– Hearing loss [Binnie, Montgomery, Jackson ’86]
[Yarbus ’67]
Audio ahead of the video
Videoconferencing– 1 msec to encode audio– Up to 250 msec to encode MPEG-4
Detectable skew130 msec [Dixon and Spitz ’80]80 msec [Steinmetz ’96]
Conventional lip synchronization
encodenetworkdecode
A
a v
time
Unsynchronized
encodenetworkdecodesync
a, v
Audio delay lineA
delayskew
Attribute delay and skew to remote person
=> person is not believable?=> person is slow?
[Reeves and Nass ’96]
encodenetworkdecode
A
a v
time
Unsynchronized
encodenetworkdecodesync
a, v
Audio delay lineA
delayskew
A new lip sync method
encodenetworkdecodesync
synchronized and low perceived latency
a v a v
encodenetworkdecode
A
a v
time
Unsynchronized
encodenetworkdecodesync
a, v
Audio delay lineA
Round trip delay
Methodology
Recorded 3 speakers– 44.1KHz x 16 bps uncompressed audio– 320x240x30fps uncompressed video– Sentences consist of easy to lipread words
Speaker 1female native
speaker
Speaker 2male native
speaker
Speaker 3male non-native
speaker
Perception of variable AV skew
* 16 subjects judging recorded videos
0
25
50
75
100
200,unsync 200,new sync
initial skew (msec) , stretch period
lip sy
nchr
oniza
tion
(%)
VSeeEye contact
Lip syncWhat frame rate is necessary
“We express ourselves into existence.” – Iris Murdoch
Minimum required frame rate
Full motion 10-30 fps
Tolerable 5 fps– [Tang and Isaac ’93]
Lip synchronization 5 fps– [Watson and Sasse ’96]
Content understanding 5 fps– [Ghinea and Thomas ’98]
Sign language recognition 1 fps– [Johnson and Caird ’96]
Gesture Detection Algorithm
input image frame difference after erosion
Visualization of algorithm
Gesture sensitive transmission allows dynamic discussion
15 fps ~0.2 fps 0.2 fps
0
1
2
3
4
5
full motion gesture sensitive low update
spea
ker c
hang
e per
min
ute )
* 8 groups of 4 people during a discussion* requires 10% of full motion bandwidth
Other studiessmile recognition time
0
350
700
0 10 20 30
video size (deg of visual angle)
time
(mse
c)
Importance of f2f interaction
0%
50%
100%
students TAs faculty
extremelyverymoderatelysomewhatnot
[Conveying ConversationalCues Through Video PhD Dissertation, 2003]
When is a smile not a smile
Value of f2f for discussion
Visualizing the pulse of Classroom
VSeeEye contact
Lip syncGestureTelework
“Laugher is the shortest distance between two people”– Victor Borge
VSee customers
telework => less money and influence
Reasons to teleworkBusiness continuity
Manage by results vs. time…
No commuteLife style
…
but
no tool is able to bridge the physical distance
VSee Lab experiment
Everybody works from home,– hotels, cafes, libraries, airports, … since June 2003– California, Michigan, Scotland, Taiwan, Malaysia
Almost all customer interaction via VSeeProduct support via desktop sharingProduct development via application sharingAvailability via presence indicator
Initial results
What doesn’t work– Still a sense of isolation
• Company meals and outings are critical!• Office of future will be social clubs?
– Remote whiteboard
A surprising bonus– Uninterrupted time to think– Building personal relationships