Download - Sensible Visual Search
Sensible Visual Search
Shih-Fu Chang
Digital Video and Multimedia Lab
Columbia University
www.ee.columbia.edu/dvmmJune 2008
(Joint Work with Eric Zavesky and Lyndon Kennedy)
digital video | multimedia lab
User Expectation for Web Search
“…type in a few words at most, then expect the engine to bring back the perfect results. More than 95 percent of us never use the advanced search features most engines include, …”
– The Search, J. Battelle, 2003
“…type in a few words at most, then expect the engine to bring back the perfect results. More than 95 percent of us never use the advanced search features most engines include, …”
– The Search, J. Battelle, 2003
Keyword search is still the primary search method
Straightforward extension to visual search
Keyword-based Visual Search Paradigm
digital video | multimedia lab
Web Image Search Text Query : “Manhattan Cruise” over Goggle Image
What are in the results? Why are these images returned? How to choose better search terms?
digital video | multimedia lab
Minor Changes in Keywords Big Difference Text Query : “Cruise around Manhattan”
When metadata are unavailable:Automatic Image Classification
Audio-visual features Geo, social features SVM or graph models Context fusion
. . .
Rich semantic description based on content analysis
Statistical models
Semantic Indexes
+-
AnchorSnowSoccerBuildingOutdoor
digital video | multimedia lab
A few good detectors for LSCOM conceptswaterfront bridge crowd explosion fire US flag Military personnel
Remember there are many not so good detectors.
Keyword Search over Statistical Detector Scores
www.ee.columbia.edu/cuvidsearch
Columbia374: Objects, people, location,
scenes, events, etc
Concepts defined by expert analysts over news video
Query “car crash snow” over TRECVID video using LSCOM concepts
How are keywords mapped to concepts?
What classifiers work? What don’t? How to improve the search terms?
Frustration of Uninformed Users of Keyword Search
Difficult to choose meaningful words/concepts without in-depth knowledge of entire vocabulary
Pains of Uninformed Users Forced to take “one shot” searches, iterating
queries with a trial and error approach...
Challenge: user frustration in visual search
A lot of work on content
analytics
? Research needed to address
user frustration
Proposal: Sensible SearchMake search experience more Sensible
Help users stay “informed” in selecting effective keywords/concepts in understanding the search results in manipulating the search criteria
rapidly and flexibly Keep users engaged
Instant feedback with minimal disruptionas opposed to “trial-and-error”
A prototype CuZero: Zero-Latency Informed Search & Navigation
Informed User: Instant Informed Query Formulation
Informed User for Visual Search:Instant visual concept suggestion
query time concept mining
Instant Concept
Suggestion
Instant Concept
Suggestion
Lexical mapping
Mapping keywords to concept definition, synonyms, sense context, etc
LSCOM
Co-occurrent concepts
roadroad
carcar
Basketball courts and the American won the Saint Denis on the Phoenix Suns because of the 50 point for 19 in their role within the National Association of Basketball
George Rizq led Hertha for the local basketball game the wisdom and sports championship of the president
Baghdad to attend the game I see more goals and the players did not offer great that Beijing Games as the beginning of his brilliance Nayyouf 10 this atmosphere the culture of sports championship
imagesimages
texttext
resultsresultsresultsresults
visual miningvisual miningvisual miningvisual miningdominant conceptsdominant conceptsdominant conceptsdominant concepts
personperson
suitssuits
Query-Time Concept Mining
CuZero Real-Time Query Interface (demo)
Instant Concept
Suggestion
Instant Concept
Suggestion
Auto-completespeech transcripts
Auto-completespeech transcripts
A prototype CuZero: Zero-Latency Informed Search & Navigation
(Zavesky and Chang, MIR2008)
Informed User:Intuitive Exploration of Results
only outdooronly people
CMU Informedia Concept Filter
linear browser restricts inspection flexibility
Informed User:Rapid Exploration of Results
Media Mill Rotor Browser
Revisit the user struggle… Car detector
Car crash detector
Snow detector
Query: {car, snow, car_crash}
How did each
concept influence
the results?
CuZero:Real-Time Multi-Concept Navigation Map
Create a multi-concept gradient map Direct user control: “nearness” = “more
influence” Instant display for each location, without new
query
“boat”
“sky”
“water”
Achieve Breadth-Depth Flexibilityby Dual Space Navigation (demo)
Breadth: Quick scan of many permutations Depth: Instant exploration of results with fixed
weightsmany query permutationsmany query permutations
Deep exploration of single permutation
Deep exploration of single permutation
execute execute query and query and download download
ranked ranked concept listconcept list
package package results with results with
scoresscores
transmit transmit to clientto client
unpackage unpackage results results
at interfaceat interface
score score images by images by concept concept weights; weights;
guarantee guarantee unique unique
positionspositions
download download images to images to
interface in interface in cached cached modemode
Latency Analysis: Workflow Pipeline
time to execute is disproportional!time to execute is disproportional!
log(t
ime)
Pipelined processing for low latency
Concept formulation (“car”)
concept formulation (“snow”)
• Overlap (concept formulation) with (map rendering) • Hide rendering latency during user interaction• Course-to-fine concept map planning/rendering• Speed optimization on-going …
Challenge: user frustration in visual search
? Research needed to address
user frustration
sensible search:
(1) query(2) visualize
+(3) analyze
DVMM Lab, Columbia University
Help Users Make Sense of Image Trend
• Many re-used content found
• How did it occur?• What manipulations?• What distribution path?• Correlation with
perspective change?
Query: “John Kennedy”
DVMM Lab, Columbia University
Manipulation correlated with Perspective
Raising the Flag on Iwo Jima Joe Rosenthal, 1945
Anti-Vietnam War, Ronald and Karen Bowen, 1969
DVMM Lab, Columbia University
Reused Images Over Time
digital video | multimedia lab
Question for Sensible Search: Insights from Plain Search Results?
Issue a text queryFind duplicate images, merge into clusters Explore history/trend
Get top 1000 results from web search engine Rank clusters (size?, original rank?)
digital video | multimedia lab
Duplicate Clusters Reveal Image Provenance
Biggest Clusters Contain Iconic Images
Smallest Clusters Contain Marginal Images
DVMM Lab, Columbia University
Deeper Analysis of Search Results: Visual Migration Map (VMM)
Duplicate Cluster Visual Migration Map
(Kennedy and Chang, ACM Multimedia 2008)
DVMM Lab, Columbia University
Visual Migration Map (VMM)
“Most Original” at the root
“Most Divergent” at the leaves
Images Derived through Series of Manipulations
VMM uncovers history of image manipulation and plausible dissemination paths among content
owners and users.
DVMM Lab, Columbia University
Ground truth VMM is hard to get
• Hypothesis
• Approximation of history is feasible by visual analysis.
• Detect manipulation types between two images
• Derive large scale history among a large image set
DVMM Lab, Columbia University
Basic Image Manipulation Operators
• Each is observable by inspecting the pair
• Each implies direction (one image derived from other)
• Other possible manipulations: color correction, multiple compression, sharpening, blurring
Original Scaled Cropped Gray Overlay Insertion
digital video | multimedia lab
Detecting Near-Duplicates
Duplicate detection is very useful and relatively reliable
Remaining challenges: scalability/speed; video duplicates; object (sub-image) (TRECVID08)
Graph Matching [Zhang & Chang, 2004] Matching SIFT points [Lowe, 1999]
DVMM Lab, Columbia University
Scale Detection
• Draw bounding box around matching points in each image
• Compare heights/widths of each box
• Relative difference in box size can be used to normalize scales
DVMM Lab, Columbia University
Color Removal
• Simple case: image stored in single channel file
• Other cases: image is grayscale, but stored in 3-channel file
• Expect little difference in values in various channels within pixels
DVMM Lab, Columbia University
More Challenging:Overlay Detection?
• Given two images, we can observe that a region is different between the two
• But how do we know which is the original?
DVMM Lab, Columbia University
Cropping or Insertion?
• Can find differences in image area
• But is the smaller-area due to a crop or is the larger area due to an insertion?
CroppingOriginal Insertion
DVMM Lab, Columbia University
Use Context from Many Duplicates
Normalize Scales and Positions
Get average value for each pixel
“Composite” image
DVMM Lab, Columbia University
Cropping Detection w/ Context
• In cropping, we expect the content outside the crop area to be consistent with the composite image
Image A Composite A Residue A
Image B Composite B Residue B
DVMM Lab, Columbia University
Overlay Detection w/ Context
• Comparing images against composite image reveals portions that differ from typical content
• Image with divergent content may have overlay
Image A Composite A Residue A
Image B Composite B Residue B
DVMM Lab, Columbia University
Insertion Detection w/ Context
• In insertion, we expect the area outside the crop region to be different from the typical content
Image A Composite A Residue A
Image B Composite B Residue B
DVMM Lab, Columbia University 56
Evaluation: Manipulation Detection
• Context-Free detectors have near-perfect performance
• Context-Dependent detectors still have errors
• Consistency checking can further improve the accuracy
• Are these error-prone results sufficient to build manipulation histories?
Context-DependentContext-Free
DVMM Lab, Columbia University
Inferring Direction from Consistency
Not Plausible
DVMM Lab, Columbia University
Manipulation Direction from Consistency
Plausible
DVMM Lab, Columbia University
Derive Manipulation among Multiple Images
DVMM Lab, Columbia University
Emerging Migration Map
• Individual parent-child relationships give rise to a manipulation history
• Relationships are only plausible (we don’t know for sure)
• Absences of relationships are more concrete (we can be more certain)
• Redundancy: plausible derivations from parents and ancestors of parents
DVMM Lab, Columbia University
Experiments
• Select 22 iconic images
• Mostly political figures, culled from Google Zeitgeist and TRECVID queries
• Generate manipulation histories:• through manual annotation
• and through fully-automatic mechanisms
DVMM Lab, Columbia University
Automatic Visual Migration Map“Originals” at source nodes
“Manipulated” at sink nodes
“Originals” at source nodes
“Manipulated” at sink nodes
DVMM Lab, Columbia University
Evaluation: Automatic Histories
• High agreement with manually-constructed histories
• Detect edits with Precision of 91% and Recall of 71%
Automatically-ConstructedManually-Constructed
Deleted
Inserted
DVMM Lab, Columbia University
Application: Summarizing Changes
• Analyze manipulation history graph structure to extract most-original and most highly-manipulated images
DVMM Lab, Columbia University
Application: Finding Perspective
• Survey image type and corresponding perspective across many examples
• Find correlation between high manipulation and negative/critical opinion
DVMM Lab, Columbia University
Joke Website:“Every time I get stoned, I go and do something stupid!” “Osama Bashed Laden”
http://www.almostaproverb.com/captions2.html
Democratic National Committee Site:“Capture Osama Bin Laden!”
http://www.democrats.org/page/petition/osama
Myspace Profile from Malaysia: “Osama Bin Laden - My Idol of All Time!”
http://www.myspace.com/mamu_potnoi
Daily Excelsior Newspaper:“Further Details of Bin Laden Plot Unearthed: ABC Report.”
http://www.dailyexcelsior.com/00jan31/inter.htm
Application: Finding Perspective
DVMM Lab, Columbia University
Geographic/Cultural Dispersion
DVMM Lab, Columbia University
Reverse Profiling
DVMM Lab, Columbia University
Conclusions• Advocate Focus on Sensible Visual Search
• Address user frustration in interactive keyword search
• In addition to work on content analytics
• Develop utilities for Informed Users
• Demo: CuZero prototype
• Instant query suggestion
• Rapid multi-concept result navigation
DVMM Lab, Columbia University
Conclusions• Explore Deeper Insight: Visual Migration Map
• Explore image reuse patterns to reveal image provenance
• Approximate image manipulation history from visual content, alone.
• Find “interesting” images at source and sink nodes within the image history
• Strong correlation with view point change
• Useful role in socio-cultural information dissemination (Web 2.0)
DVMM Lab, Columbia University
References
• CuZero:Eric Zavesky and Shih-Fu Chang, “CuZero: Low-Latency Query Formulation and Result Exploration for Concept-Based Visual Search,” ACM Multimedia Information Retrieval Conference, Oct. 2008, Vancouver, Canada.
• Internet Image Manipulation History:Lyndon Kennedy and Shih-Fu Chang, “Internet Image Archaeology: Automatically Tracing the Manipulation History of Photographs on the Web,” ACM Multimedia Conference, Oct. 2008, Vancouver, Canada