Understanding The Semantics of Media
Chapter 8
Camilo A. Celis
Questions
1. What kind of application does SVD has? How is it used in this paper?
2. What does MPESAR stands for? What does this system do?
3. How does MPESAR generally works?
Contents
Understanding the problem
Analysis Tools
Segmenting Video
Semantic Retrieval
Contents
Understanding the problem Different Approaches Segmentation Literature Semantic Retrieval Literature
Analysis Tools
Segmenting Video
Semantic Retrieval
Understanding the problem
Semantic: (N) the study of meaning. the study of linguistic development by classifying and
examining changes in meaning and form.changes in meaning and form.
Rapid growth of media: personal media, social media... Low price Social preassure
We are not understanding media.
Different Approaches
Increasing number of methods to retrieve information from media
[Aner and Kender] Finds a background in a video shot, and then clusters shorts into physical scenes by noting shots with common background.
QBIC (IMB) Allow to search for images based on the colors and images in an image. Known as query by-example.
Where is the semantics of the media?
"The most important information is in the WORDS!"
Segmentation Literature
Extension of others work. Latent semantic indexing. (LSI) - Allows to summarize the
semantic content of a document and measure similarities.
Visualization and segmentation algorithm based on wavelet analysis of text documents. (time and frequency)
Scaled-space ideas to segmentation problem. Multi-dimensional signals.
Semantic Retrieval Literature
Multimedia retrieval systems. (audio and video)
Mixtures of probability expert for semantic-audio retrieval (MPESAR) is a sophisticated model connection words and media.
- Consider the acoustic and semantics similarity of sounds, allowing user to retrieve sounds without searching on the an exact word.
"MPESAR algorithm is appropriate for mapping one type of media to another."
Contents
Understanding the problem
Analysis Tools
Segmenting Video
Semantic Retrieval
Contents
Understanding the problem
Analysis Tools SVD Principles Color Space Word Space
Segmenting Video
Semantic Retrieval
Analysis Tools
Common tools and mathematics used to analyze multimedia signals.
Two type of transformations, which reduce raw text and video signals into meaningful spaces.
Preprocessing the data
* Mapping from a one dimensional signal (speech) into a multidimensional signal (video).
Analysis Tools: SVD
SVD (Singular Value Decomposition) principles: Factorization of real or complex matrixes.
Noise reduction.
Semantic and video data are expressed as vector-value function of time.
Collect data from an entire video and put the data into a matrix X. (Columns of X represent the signal at different times)
Using SVD, rewrite the matrix X in terms of 3 matrices U,S,V.
Analysis Tools: Color Space
Color changes are useful metrics for finding the boundary between shots.
Collect a histogram of colors of each frame. (512 histograms bins)
Convert all the tree intensities RGB intensities (0-255) to a single histogram bin, by finding the log base 2, of the intensity value
Pack the tree colors into a 9-bit number using floor() to covert to an integer.
Analysis Tools: Word Space Latent Semantic indexing (LSI), uses a SVD in direct
analogy to the color analysis.
Analyse the audio data by collection a histogram of the words in a transcript of the video. Only one document to study.
Consider sentences of the document, which define a semantic space.
Issues? Synonomous and Polysemy.
SVD captures both relationships.
Contents
Understanding the problem
Analysis Tools
Segmenting Video
Semantic Retrieval
Contents
Understanding the problem
Analysis Tools
Segmenting Video Temporal Properties Video Segmentation overview Scalar Space Combined Image and Audio Data Hierarchical Segmentation Results
Semantic Retrieval
Segmenting Video Indexing by combining two major sources of
data images words
Describe the semantic path of a vide's transcript as a signal, from the initial sentence to the conclusion.
Instead of trying to find similarities (segments) see audio-visual content as a signal and look for large changes in this signal.
Scale Space
Used to find boundaries in a signal.
Analyse a signal with many different kernels that vary in size of the temporal neighborhood that is included in the analysis at each point in time.
Look for changes in the signal over time. (Do so by calculating the derivate of the signal with respect to time)
Overall
From hierarchial segmentation and compare it with other forms of segmentation.
A simple description of a video is possible by unifying the representations.
Combine 2 well known technique to find boundaries in a video. Reduce dimensionality (SVD) and put all in the same format and its application on color and word data.
Combining color, words and scale space
The result is a 20-dimensional vector function of time and scale.
Scale Space representations:
Scale Space
Results: Autocorrelation
Results: Grouping correlation
Results(cont.) Representations of the semantic information in the HeadlineNews video in scale space.
The top image shows the cosine of the angular change of the semantic trajectory with different amounts of low-pass filtering.
The middle plot shows the peaks of the scale-space derivative
The bottom plot shows the peaks traced backto their original starting point. These peaks represent topic boundaries.
Results: Shot Boundary Segmentation
Results:
Segmentation in Perspective
New framework for combining into a unified representation and for segmentation from multiple types of information from a video.
Described hierarchial segmentation
(Unexactedly) good amount of information in the color.
This method is also applicable with other type of information. (musical key, audio emotion, etc)
Contents
Understanding the problem
Analysis Tools
Segmenting Video
Semantic Retrieval
Contents
Understanding the problem
Analysis Tools
Segmenting Video
Semantic Retrieval The algorithm Testing Conclusions
Semantic Retrieval: MEPSAR
Connecting sounds to words and vice-versa. Queries with sounds and words
Learn about the connections between semantic space and acoustic space.
Algorithm Semantic Features
Uses PORTER stemmer to remove common suffixes from words, and deletes common words before further processing.
Partition the space into overlapping clusters of regions. Acoustic Features
Signal processing and machine learning calculations endeavors to capture the sound.
MFCC(mel-frequency cepstral coefficient) Analyse speech sounds. Used to reduce the audio signal
GMM captures the long-term characteristics of each sound.
Semantic Retrieval
Acoustic signal processing chain
Building MPESAR models
Testing
Audio to semantic testing procedure.
Retrieval Results
Histogram of true label ranks based on likehood from audio-to-semantic test.
Histogram of true label ranks based on likehood from semantic-to-audio test.
Questions1. What kind of application does SVD has? How is it used?
The SVD has also applications in digital signal processing, e.g., as a method for noise reduction. It allows to summarize different kind of video data and combine the results into a common representation.
2. What does MPESAR stands for? What does this system do?(Mixture of Probability Expert for Semantic-Audio Retrieval) Learns the connections between a semantic space and an acoustic space.
-Ex) Given a description of a word, the system finds audio signal that best fits the word.
3. How does MPESAR generally works?Semantic space maps words into a high-dimentional probabilistic
space. Acoustic space describes sounds by a multidimensional vector. A many to many connection.
Thank you
Questions1. What kind of application does SVD has? How is it used?
The SVD has also applications in digital signal processing, e.g., as a method for noise reduction. It allows to summarize different kind of video data and combine the results into a common representation.
2. What does MPESAR stands for? What does this system do?(Mixture of Probability Expert for Semantic-Audio Retrieval) Learns the connections between a semantic space and an acoustic space.
-Ex) Given a description of a word, the system finds audio signal that best fits the word.
3. How does MPESAR generally works?Semantic space maps words into a high-dimensional probabilistic
space. Acoustic space describes sounds by a multidimensional vector. A many to many connection.
Q&A