dublin city universitycentre for digital video processing sensecam work at dublin city university...
TRANSCRIPT
Dublin City University Centre for Digital Video Processing
SenseCam Work at Dublin City University
Alan F. Smeaton, Gareth J.F. Jones and Noel E. O’Connor (PIs) Georgina Gaughan, Cathal Gurrin, Hyowon Lee, Hervé Le Borgne
(PostDocs)Aiden Doherty, Michael Blighe, Ciarán Ó’Conaire, Michael McHugh,
Saman Cooray (PhD students) Barry Lavelle, Paul Reynolds (Masters students)
Sandrine Áime (Summer student)
… 15 people working on SenseCams in some way at DCU
Center For Digital Video Processing,Dublin City University, Ireland
Dublin City University Centre for Digital Video Processing
Overview
• Our contribution to developing SenseCam work;• Automatic event segmentation - 3 approaches; • Application: generation of rolling weekly
summary based on Addenbrook’s• Face detection and body patch matching
– Arizona data
• Using BT and other sensors for context• Alternative way to presenting SenseCam images
Dublin City University Centre for Digital Video Processing
Our (DCU) Contribution
• We do image/video analysis, indexing, summarisation, etc. and we apply this to SenseCam data;
• We have no particular SenseCam application, we will develop underlying technology;
• We’re keen to hear about the real problems of SenseCams in practice, and to offer …
• We consider the typical full-day SenseCam images, do event segmentation and summarisation;
Dublin City University Centre for Digital Video Processing
A day’s SenseCam images (3,000 – 4,000)
Multiple Events
Finishing work in the lab
At the bus stop
Chatting at Skylon Hotel lobby
Moving to a room
Tea time On the way back home
Event Segmentation
Summarisation
Dublin City University Centre for Digital Video Processing
Automatic Event Segmentation
• Task: automatically determine events from a collection of SenseCam image data;
• Based around image-image similarity using MPEG-7 features where differences may indicate events;
• Similar problem to shot bound detection in video but more challenging given the fish-eye view and lesser similarities within an event vs. a shot;
• Several approaches can be taken:
Dublin City University Centre for Digital Video ProcessingSimilarity Calculation between 2 Images
Similarity Score
:
• Scalable Colour• Colour Structure• Colour Layout • Colour Moments• Edge Histogram• Homogeneous Texture
Extract MPEG-7 descriptors for this image
• Scalable Colour• Colour Structure• Colour Layout • Colour Moments• Edge Histogram• Homogeneous Texture
Extract MPEG-7 descriptors for this image
:
Dublin City University Centre for Digital Video Processing
... adjacent images
One Day’s Images
0.8 0.65 0.7 0.15
... pairwise
0.910.150.74
0.7
... adjacent blocks of 10 images
0.65
......
0.82 0.92
...... ......
Event-segmented images of a day
• Scalable Colour• Colour Structure• Colour Moments• Edge Histogram
Extract MPEG-7 descriptors...
For each image...
... to compare Similarity between...
Event Segmentation: Approach I
Dublin City University Centre for Digital Video Processing
• Stage 1: – comparison of adjacent
images
• Stage 2: – Comparison every 2nd
image
• Stage 3: – Comparison of blocks of
images– Incorporation of a face
detector
Dublin City University Centre for Digital Video Processing
Preliminary Results Images from 1 day
Number of pictures: 2685Manually detected events: 27
Lots more to do, including fusion of descriptors and optimising windowing
Correct events automatically identified Precision
Color Moment 14 0.07
Edge Histogram 15 0.11
Color Structure 17 0.07
Scalable Color 18 0.04
Dublin City University Centre for Digital Video Processing
Event Segmentation II
• Use similarity clustering, and time– Combine low-level content analysis and
context information (i.e. metadata provided by the SenseCam and temporal data)
– Generate a similarity matrix by fusing low-level and metadata information
– Implement time constraints to constrain clustering
– Simple hierarchical clustering of images into events
Dublin City University Centre for Digital Video Processing
One Day’s Images
• Scalable Colour• Colour Layout • Edge Histogram• Homogeneous Texture
Extract MPEG-7 descriptors Then apply
Temporal constraints...
For each image...
+ GPS
meta-data ...
• Light• Temperature• Accelerometer
...
:
Similarity matrix
... to calculate Similarity among
images
Event Segmentation: Approach II
... to variate the number of Events
1 Event (whole set as 1 Event)
2 Events
4 Events
8 Events
:
..........
..........
..........
Event-segmented images of a day(2 Events)
Dublin City University Centre for Digital Video Processing
... to variate the number of Events
1 Event (whole set as 1 Event)
2 Events
4 Events
8 Events
:
..........
..........
..........
Event-segmented images of a day(2 Events)
One Day’s Images
• Scalable Colour• Colour Layout • Edge Histogram• Homogeneous Texture
Extract MPEG-7 descriptors Then apply
Temporal constraints...
For each image...
+ GPS
meta-data ...
• Light• Temperature• Accelerometer
...
:
Similarity matrix
... to calculate Similarity among
images
Event Segmentation: Approach II
Event-segmented images of a day(4 Events)
Dublin City University Centre for Digital Video Processing
... to variate the number of Events
1 Event (whole set as 1 Event)
2 Events
4 Events
8 Events
:
..........
..........
..........
One Day’s Images
• Scalable Colour• Colour Layout • Edge Histogram• Homogeneous Texture
Extract MPEG-7 descriptors Then apply
Temporal constraints...
For each image...
+ GPS
meta-data ...
• Light• Temperature• Accelerometer
...
:
Similarity matrix
... to calculate Similarity among
images
Event Segmentation: Approach II
Event-segmented images of a day(2 Events)
Event-segmented images of a day(4 Events)
Event-segmented images of a day(8 Events)
Dublin City University Centre for Digital Video Processing
Approach II: Results
Dublin City University Centre for Digital Video Processing
Approach III: Group Images into 3 Classes
• Static Person– Person performing one activity– E.g. at computer, meeting, eating etc.
• Moving Person– Travelling between locations
• Static Camera– Sense Cam is put down– User is not wearing it
Dublin City University Centre for Digital Video Processing
Features Used
1. Block-based Cross-Correlation
2. Spatiogram image colour similarity• Compares image colour spatial distribution
3. Accelometer motion
• Feature-based training• Using Bayesian approach to classification• Viterbi algorithm used to smooth results
• Applied to 1 day SenseCam images so far
Dublin City University Centre for Digital Video Processing
Static Camera
One Day’s Images
For adjacent images, calculate...
......
Event Segmentation: Approach III
Block-based Cross-correlation
Spatiogram Similarity
+
+
Accelerometer (motion)
Event-segmented (& classified) images of a day
... then Smoothing (viterbi algorithm)
SP MP SP MP SP SC
Moving Person
Static Person
Classify each image into 3 groups (Bayesian classification)...
Dublin City University Centre for Digital Video Processing
Accelerometer Data Example
Dublin City University Centre for Digital Video Processing
Generation of Weekly Summaries
• Assume events already segmented;• Calculate average values for events of low level
features from all images; • Generate similarity matrix using the average value
from each event;
• Visually similar events can then be detected, and the time period (week) structured automatically into a short movie;
• Why a movie week … Addenbrooke’s Cambridge application;
Dublin City University Centre for Digital Video ProcessingGeneration of Weekly Summary
Event-Segmented image sets
Mon
Tue
Wed
Sat
Thr
Fri
Sun
...
:Event-level Similarity matrix
Compare Event-Event similarity within a week
Clustering of similar Events
Dublin City University Centre for Digital Video ProcessingGeneration of Weekly Summary
Event-Segmented image sets
Mon
Tue
Wed
Thr
Fri
Sat
Sun
Compare Event-Event similarity within a week
Similar Events - Aiden working on the desk
Clustering of similar Events
...
:Event-level Similarity matrix
Dublin City University Centre for Digital Video ProcessingGeneration of Weekly Summary
Event-Segmented image sets
Mon
Tue
Wed
Thr
Fri
Sat
Sun
Compare Event-Event similarity within a week
Similar Events - Aiden waiting for bus
Clustering of similar Events
...
:Event-level Similarity matrix
Dublin City University Centre for Digital Video ProcessingGeneration of Weekly Summary
Event-Segmented image sets
Mon
Tue
Wed
Thr
Fri
Sat
Sun
Compare Event-Event similarity within a week
Similar Events - Aiden at the office corridor
Clustering of similar Events
...
:Event-level Similarity matrix
Dublin City University Centre for Digital Video Processing
Unique Event 6
Generation of Weekly Summary
Event-Segmented image sets
Mon
Tue
Wed
Thr
Fri
Sat
Sun
Compare Event-Event similarity within a week
Unique Event 1
Clustering of similar Events
...
:Event-level Similarity matrix
Unique Event 2
Unique Event 3
Unique Event 4
Unique Event 5
Dublin City University Centre for Digital Video ProcessingGeneration of Weekly Summary
Event-Segmented image sets
Mon
Tue
Wed
Thr
Fri
Sat
Sun
Compare Event-Event similarity within a week
Similar Events - Aiden waiting for bus
Similar Events - Aiden at the office corridor
Similar Events - Aiden working on the desk
Unique Events
...
:Event-level Similarity matrix
Select images 1 Week summary (on Sunday)
Mon
Dublin City University Centre for Digital Video ProcessingGeneration of Weekly Summary
Event-Segmented image sets
Mon
Tue
Wed
Thr
Fri
Sat
Sun
Compare Event-Event similarity within a week
Similar Events - Aiden waiting for bus
Similar Events - Aiden at the office corridor
Similar Events - Aiden working on the desk
Unique Events
...
:Event-level Similarity matrix
Mon
1 Week summarySelect images (on Sunday)
Select images (on Monday)
Tue
Dublin City University Centre for Digital Video ProcessingGeneration of Weekly Summary
Event-Segmented image sets
Mon
Tue
Wed
Thr
Fri
Sat
Sun
Compare Event-Event similarity within a week
Select images
Similar Events - Aiden waiting for bus
Similar Events - Aiden at the office corridor
Similar Events - Aiden working on the desk
Unique Events
...
:Event-level Similarity matrix
Mon
Select images
Tue
1 Week summary (on Sunday)
(on Monday)
Select images (on Tuesday)
Wed
Dublin City University Centre for Digital Video Processing
Select images
Generation of Weekly Summary
Event-Segmented image sets
Mon
Tue
Wed
Thr
Fri
Sat
SunCompare Event-Event similarity within a week
Select images
Similar Events - Aiden waiting for bus
Similar Events - Aiden at the office corridor
Similar Events - Aiden working on the desk
Unique Events
...
:Event-level Similarity matrix
Mon
Select images
Tue
1 Week summary (on Sunday)
(on Monday)
(on Tuesday)
WedSelect images (on Wednesday)
Dublin City University Centre for Digital Video Processing
Preliminary Results
EVENTCOLOURLAYOUT
SCALABLECOLOUR
HOMOGENEOUSTEXTURE
EDGEHISTOGRAM
Working in office 5 (50%) 5 (50%) 4 (40%) 10 (100%)
Walking 5 (50%) 9 (90%) 4 (40%) 9 (90%)
Meeting colleague (s) 9 (90%) 5 (50%) 8 (80%) 5 (50%)
Shopping 1 (10%) 4 (40%) 0 (0%) 7 (70%)
Meal at home 4 (40%) 4 (40%) 5 (50%) 6 (60%)
At coffee machine 6 (60%) 6 (60%) 4 (40%) 3 (30%)
On bus 3 (30%) 3 (30%) 3 (30%) 1 (10%)
Lunch at work 0 (0%) 2 (20%) 0 (0%) 1 (10%)
In bar 2 (20%) 2 (20%) 1 (10%) 2 (20%)
Giving lecture 1 (10%) 1 (10%) 1 (10%) 2 (20%)
Average 3.6 (36%) 4.1 (41%) 3.0 (30%) 4.6 (46%)
Number of similar images to a known event, from top 10 retrieved
Dublin City University Centre for Digital Video Processing
Face Detection & Body Patch Matching
• Apply face detection software to detection the presence of a face in the SenseCam image
• Body Patch Matching– Identify similar body patch by color to detect
subsequent appearances within an event;
• This works well for personal photos, but SenseCam images are lower quality;
Dublin City University Centre for Digital Video ProcessingSimilarity Comparison by Person Detection
8:28am, 7 June 2006 5:03pm 30 May 2006
Combined Similarity Score
Face Extraction
Body Patch Extraction
Face Extraction
Body Patch Extraction
Similarity Score
Similarity Score
Dublin City University Centre for Digital Video Processing
Arizona State U. Data
• ASU gave us some SenseCam data 2 weeks ago• Session rather than all-day images;• Applied automatic event detection using 4x
MPEG-7 low-level feature descriptors– Both Color Structure and Color Moments outperform
others
• Face Detection software performs badly on this data– Blurred Images cause “standard” face detection
software to fail
Dublin City University Centre for Digital Video Processing
Event detection using ASU data: 28-June-2006
Number of pictures: 357
Manually detected events: 28
Relevant events automatically identified Precision
Color Moment 6 0.25
Edge Histogram 11 0.28
Color Structure 14 0.42
Scalable Color 18 0.28
Dublin City University Centre for Digital Video Processing
Event detection using ASU data: 28-June-2006
Number of pictures: 434
Manually detected events: 11
Relevant landmarks automatically identified
Precision
Color Moment 6 0.17
Edge Histogram 7 0.15
Color Structure 6 0.12
Scalable Color 8 0.10
Dublin City University Centre for Digital Video Processing
Using BT to provide context
• Achieved by logging Bluetooth devices in close proximity to the SenseCam wearer;
• May be useful in determining which individuals are present around each picture;
• Application created to poll and log Bluetooth devices on phone;
• Currently developing host application to interface with mobile device and retrieve log file
• Next step: synchronize time-stamps between SenseCam images and Bluetooth log file
Dublin City University Centre for Digital Video Processing
• Concept : To determine whether “events” can be identified based on multiple sensor data
• Data collected from:– GPS Device– BodyMedia Device– Heart Rate Monitor– SenseCam
• Development of a framework to extract the relevant data from the different data sources– CSV files, XML files, text files, Excel files
Use of Multi-Sensor Data
Dublin City University Centre for Digital Video Processing
Presenting SenseCam Images?
E.g. intelligent summary of one day (playback for 1 minute)
... watching the fast playback of image sequences is not an ideal interaction:
• Intensive concentration required during playback
• Event boundaries cannot be clearly presented
• Sense of time is skewed (more #images of an ‘important’ event, even if it lasted only 1 minute; less #images of ‘unimportant’ regular events even if they last many hours during the day)
Dublin City University Centre for Digital Video Processing
Turn sequential playback into an interactive, spatial browsing interaction (similar to the way we turn video playback into keyframe browsing) =>
Dublin City University Centre for Digital Video Processing
31 May 2006
Approach:• 1-page visual summary of a day
• Each image represents each event
• Size of each image represents the ‘importance’ or ‘uniqueness’ of the event
• Timeline on top orientates the user about time when each event happened
• Mouse-Over activated
Dublin City University Centre for Digital Video Processing
31 May 2006
This is the most unique event of the dayTwo unusual meetings that happened that day in the lab
Repeating Events are listed as small size at the bottom
Dublin City University Centre for Digital Video Processing
31 May 2006
Mouse-Over will start playback that Event, while highlighting the time of that Event: this event (meeting a friend in Skylon hotel lobby) happened in the evening, for about 1.2 hour
Dublin City University Centre for Digital Video Processing
31 May 2006
Talking with Gareth happened only 10 minutes, in the morning
Dublin City University Centre for Digital Video Processing
31 May 2006
Working in the main morning time: 1.2 hours
Dublin City University Centre for Digital Video Processing
31 May 2006
Then my last desk-work of the day (2 hours) just after lunch time
Dublin City University Centre for Digital Video Processing
31 May 2006
My lunch break
Dublin City University Centre for Digital Video Processing
31 May 2006
My dinner time
Dublin City University Centre for Digital Video Processing
31 May 2006
Conclusion:• More relaxed, interactive, inviting summary of the day than fast-forwarding, while still taking advantage of playback synergy effect
• Playing each of the Events in its location might be also good (without having to Mouse-Over)
• ‘Importance’ is not by playing more images in that Event (this skews time), but by larger image size
Dublin City University Centre for Digital Video Processing
Papers written
• “Exploiting context information to aid landmark detection in SenseCam images”, submitted to ECHISE - 2nd International Workshop on Exploiting Context Histories in Smart Environments: Infrastructures and Design to be held at 8th UbiComp, Sept. 2006, Irvine, CA, USA;
• “Structuring a Visual Lifelog Diary by Automatically Linking Events”, submitted to 3rd ACM Workshop onCapture, Archival and Retrieval of Personal Experiences (CARPE 2006) October, 2006, Santa Barbara, California, USA.
• “Organising a daily visual diary using multi-feature clustering”, submitted to SPIE Electronic Imaging, San Jose, January 2007;
Dublin City University Centre for Digital Video Processing
Future Work
EVERYTHING !