chinese academy of sciences, beijing, china speech and language processing techniques report...
TRANSCRIPT
![Page 1: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/1.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
Overview of MPEG-7Overview of MPEG-7
Dr Zhang Sen
Speech Group, INRIA-LORIAVillers les Nancy, France
Chinese Academy of SciencesBeijing, China
04/18/23
![Page 2: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/2.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
2
Outline of contents
• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information
![Page 3: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/3.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
3
Ozone WP2 architecture
Ozone application
Software Environment layer
Oz
on
e
Servic
es
Situation Sensitivity
User Context
OzoneContext
Multi-modal widgets
Dialog management
smartagent User
Interfacemana-
gement Percep-
tion QoS
Security
speechrecognition
videobrowser
...
animated agent
Authen-tication
User-interaction module
gesturerecognition
![Page 4: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/4.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
4
90 92 94 98 99 01 ?
v1 v2
mpeg1 mpeg2 mpeg4 mpeg7 mpeg21
• MPEG-3, ever defined, but abandoned
• MPEG-5 and -6, not defined
From MPEG-1 to MPEG-7
![Page 5: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/5.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
5
MPEG-1 – Coding of moving pictures and audio for digital
storage media (CD-ROM, MP3), 11/92
MPEG-2 – Generic Coding of moving pictures and audio
information (DVD, Digital TV), 11/94
MPEG-4 – Coding of Audiovisual Objects for MM appls
Ver1 09/98, Ver2 11/99
MPEG-7 – Multimedia content description for AV material 08/01
MPEG-21 – Digital AV framework: Integration of
multimedia technologies, 11/01
MPEG Family
![Page 6: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/6.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
6
Why is MPEG-7 needed
• Digital audiovisual information increasing– more and more available contents– all kinds of sources of information
• Use of the digital audiovisual information– description of the contents– fast search of the contents
![Page 7: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/7.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
7
Objective of MPEG-7
• Standardize content-based description for various types of audiovisual information – Enable fast and efficient content searching, filtering and
identification
– Describe several aspects of the content (low-level features, structure, semantic, models, collections, creation, etc.)
– Address a large range of applications
• Types of audiovisual information: – Audio, speech
– Moving video, still pictures, graphics, 3D models
– Information on how objects are combined in scenes
![Page 8: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/8.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
8
Scope of MPEG-7
• The description generation (feature extraction, indexing process, annotation & authoring tools,...) and consumption (search engine, filtering tool, retrieval process, browsing device, ...) are non normative parts of MPEG-7.
• The goal is to define the minimum that enables interoperability.
DescriptionDescriptiongeneration
Description consumption
Scope of MPEG-7Research and
future competitionResearch and
future competition
![Page 9: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/9.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
9
Scope of MPEG-7
Feature SearchExtraction Engine
MPEG-7Description
standardization
Search Engine:Searching & filteringClassificationManipulationSummarization Indexing
MPEG-7 Scope:Description Schemes (DSs)Descriptors (Ds)Language (DDL)Ref: MPEG-7 Concepts
Feature Extraction:Content analysis (D, DS)Feature extraction (D, DS)Annotation tools (DS)Authoring (DS)
![Page 10: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/10.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
10
Audio in MPEG-7
• Audio content description (yes)
• Sound retrieval and classifier (yes)
• Speech synthesis (no)
• Speech recognition (no)
• Probability Models (yes)
![Page 11: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/11.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
11
Parts of the MPEG-7 Standard
• ISO / IEC 15938 - 1: Systems • ISO / IEC 15938 - 2: Description Definition Language • ISO / IEC 15938 - 3: Visual • ISO / IEC 15938 - 4: Audio • ISO / IEC 15938 - 5: Multimedia Description Schemes • ISO / IEC 15938 - 6: Reference Software
![Page 12: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/12.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
12
Outline of contents
• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information
![Page 13: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/13.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
13
Main elements of MPEG-7
• Descriptors (D): representations of features, that define the syntax and the semantics of each feature representation (low-level).
• Description Schemes (DS): that specify the structure and semantics of the relationships between their components, which may be both Ds and DSs (high-level).
• A Description Definition Language (DDL): based on XML Schema, to allow the creation of new DSs and Ds, and to allow the extension and modification of existing DSs
• System tools: to support multiplexing of descriptions, synchronization issues, transmission mechanisms, coded representations, management and protection of intellectual property
![Page 14: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/14.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
14
Relations of main elements
DS
DDL
DSDS
DSDS
D
DDD
D DSDS
DS
DD
D
![Page 15: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/15.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
15
Description Definition Language
• Description Definition Language (DDL) is a language
that define what description is valid, and allows the
creation of new Description Schemes and Descriptors.
It also allows the extension and modification of existing
Description Schemes• DDL is used to define a set of formal rules
• ordering of the elements
• occurrences of elements
……...
• XML + MPEG-7 extensions
![Page 16: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/16.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
16
• Why choose XML as the base for the DDL? • The popularity of XML• The interoperability with other standards in the future
• Why XML should be extended for MPEG-7?• SGML > XML• Structural extensions• Datatype extensions
XML: Base for DDL
![Page 17: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/17.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
17
DDL parser
DDL parser is a software to check if
a description is valid
Description Parser
Schema
YesorNo
![Page 18: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/18.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
18
Outline of contents
• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information
![Page 19: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/19.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
19
Type of descriptions
• Low level description (features, etc)• Generic and flexible • Intelligent / efficient search engine
• High level description (structures, concepts,etc)• Efficient and powerful • Lack of flexibility
![Page 20: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/20.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
20
Low-level Description
• Information in the creation and production processes• director, title, short feature movie
• Information related to the usage of the content • copyright pointers, usage history, broadcast schedule
• Information on the storage features of the content • storage format, encoding
• Information about low-level features in the content • colors, textures, sound timbres, melody
![Page 21: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/21.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
21
High-level Description
• Structural description – video segments, frames, still and moving regions,
audio segments– Segment DS (representing the spatial, temporal or
spatio-temporal structure)• Conceptual (semantic) description
– objects, events, and notions – links of the two descriptions
![Page 22: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/22.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
22
Illustration of descriptions
![Page 23: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/23.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
23
Basic description
• Elements– Information containers– containing data and other elements– <city> …… </city>
• Attributes– Attribute-value pairs used to characterize elements– <city population=“10000”> …… </city>
![Page 24: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/24.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
24
Structured descriptions
• Structured descriptions are trees• Trees are suitable for retrieval and search
DS
DS DS D
D D DD
![Page 25: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/25.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
25
Description trees<letter>
<header><name> Mr Sen </name><address>
<street> 16 rue Laplace </street><city> Nancy </city>
</address></header><text> Dear Mr White, …</text>
</letter>
text
name
letter
header
address
street city
![Page 26: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/26.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
26
Example: Audio description
<Mpeg7Main><DescriptionMetadata>
<Version>1.0</Version></DescriptionMetadata><ContentDescription>
<AudioContent xs1:type=“AudioType”><Audio>
<CreationInformation><Creation>
<Title> The daily news </Title></Creation>
</CreationInformation></Audio>
</AudioContent></ContentDescription>
</Mpeg7Main>
![Page 27: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/27.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
27
Outline of contents
• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information
![Page 28: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/28.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
28
Audio description
• Low-level Description – spectrum, parametric, and temporal features
• High-level Description– Audio signature Description Scheme – Instrument timbre Description Schemes – The melody Description Tools – Sound recognition and indexing Description To
ols– Spoken Content Description Tools
![Page 29: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/29.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
29
Audio low-level descriptors
• Waveform• Loudness• Spectral basis• Spectral envelope• Spectral centroid• Spectral spread• Fundamental frequency• Harmonicity• Attack time
![Page 30: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/30.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
30
Audio descriptor: Basic
• Two basic audio Descriptors– AudioWaveform Descriptor
• describes the audio waveform envelope (minimum and maximum)
– AudioPower Descriptor • describes the temporally-smoothed instantaneous po
wer
![Page 31: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/31.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
31
Audio descriptor: Basic Spectral
• AudioSpectrumEnvelope Descriptor– describes the short-term power spectrum
• AudioSpectrumCentroid Descriptor – describes the center of gravity of the log-frequency po
wer spectrum
• AudioSpectrumSpread Descriptor – describing the second moment of the log-frequency po
wer spectrum
• AudioSpectrumFlatness Descriptor – describes the flatness properties of the spectrum
![Page 32: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/32.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
32
Audio Signature Description
• AudioSignature Description Scheme provides a unique content identifier for the purpose of robust automatic identification of audio signals
• Applications include – audio fingerprinting– identification of audio– locating metadata for legacy audio content
![Page 33: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/33.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
33
Instrument Timbre Description
• Timbre is defined as the perceptual features that make two sounds having the same pitch and loudness sound different.
• Timbre Description describes the perceptual features with a reduced set of Descriptors– HarmonicInstrumentTimbre Descriptor – LogAttackTime Descriptor– PercussiveIinstrumentTimbre Descriptor – Combination with Basic Spectral Descriptors
![Page 34: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/34.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
34
Melody Description Tools
The melody Description Tools is to facilitate efficient, robust, and expressive melodic similarity matching
• MelodyContour Description Scheme– 5-step contour representation– basic rhythmic information representation
• MelodySequence Description Scheme – supporting an expanded descriptor set and high p
recision of interval encoding
![Page 35: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/35.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
35
General Sound Recognition and Indexing Description Tools
• SoundModel (SM) DS– statistical model, such as HMM or GMM– SoundModelStatePath Descriptor
• consists of a state sequence generated by a SM– SoundModelStateHistogram Descriptor
• consists of a normalized histogram of the state sequence generated by a SM given an audio segment
• SoundClassificationModel DS – a trainable multi-way classifier based on SMs
• speech vs music, male vs female, trumpet vs violin• genre classification, voice recognition
![Page 36: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/36.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
36
Spoken content retrieval
• Output of ASR– phone lattice or word lattice– spoken content DS stores these
lattices instead of plain text– lattices are good for retrieval
![Page 37: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/37.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
37
Spoken Content Description Tools
• SpokenContentLattice– representing the actual decoding produced by a
n ASR engine
• SpokenContentHeader– contains information about the speakers being r
ecognized and the recognizer itself– WordLexicon Descriptor – PhoneLexicon Descriptor– SpeakerInfo Descriptor– ConfusionInfo Descriptor
![Page 38: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/38.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
38
Gaussian DS
<Gaussian>
<Mean>
4087.18 7173.73 1.36364 94.2727 1834.36 2359.55 2645.27 2577.09
………………………………
</Mean>
<Variance>
1.6982e+007 5.21621e+007 14.3636 9749.09 3.65743e+006
………………………………
</Variance>
</Gaussian>
![Page 39: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/39.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
39
State-transition model DS<StateTransitionModel>
<Transitions size1="20" size2="20">
0 0 0.210526 0.0526316 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
……………………………………
</Transitions>
<Initial size="20">
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
</Initial>
<State label="0 players" confidence="1">
……………………………………
<State label="19 players" confidence="0.223607">
</StateTransitionModel>
![Page 40: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/40.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
40
ProbabilityModelClassier DS<ProbabilityModelClassifier confidence="0.9" length="2">
<ProbabilityModelClass SemanticLabel="fish" Confidence="0.5"
DescriptorName="ColorHistogram">
<Gaussian>
<Mean>
4087.18 7173.73 1.36364 94.2727 1834.36 2359.55
………………………….
</Mean>
<Variance>
1.6982e+007 5.21621e+007 14.3636 9749.09
………………………….
</Variance>
</Gaussian>
</ProbabilityModelClass>
![Page 41: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/41.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
41
SpokenContentLattice DS
A lattice structure for an hypothetical (combined phone and word) decoding of the expression “Taj Mahal drawing …”.
![Page 42: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/42.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
42
MPEG-7SOUND
DATABASE
SoundModelStatePath
SoundRecognitionClassifier
HMM 2
HMM 1
HMM N-1
HMM N
MODEL REF+STATE PATH
HMMAND
BASES
SELECTAUDIOQUERY
SPECTRUMPROJECTION
N
SoundRecognitionModel
Segmented AudioDescription
AudioSpectrumBasis
Extraction of sound indexes using a sound-recognition classifier. The model reference and state
path is stored.
![Page 43: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/43.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
43
MATCHING
MPEG-7SOUND
DATABASE
RESULT LIST
SoundModelStatePath
SoundRecognitionClassifier
HMM 2
HMM 1
HMM N-1
HMM N
MODEL REF+STATE PATH
HMMAND
BASIS
SELECTAUDIOQUERY
SPECTRUMPROJECTION
N
SoundRecognitionModel
AudioSpectrumBasisContinuousMarkovModel
Indexed Audio
Query-by-example application with a query in media source form. Features must be
extracted and projected into the classification space for each model
in order to match against the database.
![Page 44: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/44.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
44
MATCHING
MPEG-7SOUND
DATABASE
RESULT LIST
MODEL REF +STATE PATH
DDLQUERY
An example search application utilizing a query in DDL format
![Page 45: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/45.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
45
Extraction of hidden Markov model and basis functions
and storage in a DDL representation
HMMAND
BASISAUDIOWAV FILES
BASISEXTRACT
HMM
SoundRecognitionModel
FEATUREEXTRACT
AudioSpectrumBasis
SoundRecognitionFeatures ContinuousMarkovModel
![Page 46: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/46.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
46
Scenario for for the spoken content Description Tools
• Recall of AV data by memorable spoken events– A film or video recording where a character or person spoke a particular
word or sequence of words. The source media would be known, and the query would return a position in the media.
• Spoken Document Retrieval– There is a database consisting of separate spoken documents. The result
of the query is the relevant documents, and optionally the position in those documents of the matched speech
• Annotated Media Retrieval– Similar to spoken document retrieval. The result of the query is the
media which is annotated with speech, and not the speech itself. An example is a photograph retrieved using a spoken annotation.
![Page 47: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/47.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
47
Outline of contents
• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information
![Page 48: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/48.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
48
Multimedia DSs
• Basic Elements• Content Management• Content Description• Content Organization• Navigation and Access• User Interaction
Multimedia Description Schemes are metadata structures for describing and annotating audio-visual (AV) content
![Page 49: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/49.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
49
Organization of Multimedia DSs
![Page 50: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/50.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
50
Content Management• Creation and production information
– Creation information • title, textual annotation, creators, and dates
– Classification information• genre, subject, purpose, language
• Media coding, storage and file formats– format, compression, and coding
• Content usage– usage rights, usage record
![Page 51: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/51.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
51
Navigation and Access
• Summaries– hierarchical summaries– sequential summaries
• Partitions and Decompositions– decompositions in space, time and frequency– used in multi-resolution access and progressive retrieval
• Variations– selection of the most suitable of an AV program– adapt to the different capabilities of terminal devices,
network conditions or user preferences
![Page 52: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/52.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
52
Hierarchical summary
![Page 53: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/53.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
53
Illustration of variations
![Page 54: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/54.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
54
Content Organization
• Collections– group the contents into clusters
– describes statistics and models of the attribute values – describe relationships among collection clusters
• Models– model the attributes and features of AV content– Probability Model
• specify statistical functions and structures – Analytic Model
• specify semantic labels • specify the confidence• build classifiers
![Page 55: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/55.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
55
Collection Structure
![Page 56: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/56.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
56
User Interaction
• User Preference– context dependency in terms of time and place– relative importance of different preferences– privacy characteristics of the preferences – preferences update by agent or user
• Usage History – history of actions – used to determine the user's preferences
![Page 57: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/57.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
57
Outline of contents
• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information
![Page 58: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/58.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
58
eXperimentation Model(XM)
• Simulation platform for:• Ds, DSs, CSs, DDL
• XM applications: • the server (extraction) applications • the client (search, filtering and/or transcoding) applications
CS: Coding Schemes
![Page 59: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/59.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
59
The XM applications
• Extraction from Media• all low-level Ds or DSs should have an application class of this type
• Search & Retrieval Application• either client application
• Media Transcoding Application• either client application
• Description Filtering Application• either client application
![Page 60: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/60.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
60
Extraction from Media
![Page 61: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/61.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
61
Search and retrieval application
![Page 62: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/62.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
62
Media transcoding application
![Page 63: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/63.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
63
Description Filtering Application
![Page 64: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/64.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
64
Interface model for XM app
![Page 65: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/65.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
65
Real world application
MDB = media database, DDB = description database. First, from a media database two features are extracted. Then, basing on the first feature,
relevant media files are selected from the media database. The relevant media files are transcoded basing on the second extracted feature.
![Page 66: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/66.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
66
• Storage and retrieval of audiovisual databases (image, film, radio archives)
• Broadcast media selection (radio, TV programs)
• Surveillance (traffic control, surface transportation, production chains)
• E-commerce and Tele-shopping (searching for clothes / patterns)
• Remote sensing (cartography, ecology, natural resources management)
• Entertainment (searching for a game, for a karaoke)
• Cultural services (museums, art galleries)
• Journalism (searching for events, persons)
• Personalized news service on Internet (push media filtering)
• Intelligent multimedia presentations
• Educational applications nBio-medical applications
MPEG-7 application areas
![Page 67: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/67.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
67
Illustration of applications
Users
![Page 68: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/68.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
68
Information Flow
Feature extraction
Transmission
Storage
AV Description
Search/query
Browse
Filter
UsersUsers
PullPull
PushPush
Manual/automatic
DecodingEncoding
![Page 69: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/69.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
69
Push and Pull applications
• Push applications– Example: Search engines for internet and DBs – Advantage: Many search engines work on stand
ardized descriptions
• Pull applications– Example: Broadcast of video, Interactive TV – Advantage: Intelligent agents filter standardize
d descriptions
![Page 70: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/70.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
70
Example: Pull application
MPEG-7MPEG-7DatabaseDatabase
![Page 71: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/71.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
71
Example: Push application
![Page 72: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/72.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
72
Example: queries
• Text (keywords): – Find AV material with subject corresponding to some k
eywords • Semantic description:
– Find AV material corresponding to a specified semantic • Image as an example:
– Find an image with similar characteristics (global or local)
• A few notes of music: – Find corresponding musical pieces or movies
• Low level features (example: motion): – Find video with specific object motion trajectories
![Page 73: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/73.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
73
Integration of MPEG-7 into XML
<seq begin=20s dur=10s> <img id="Image1" dur=5s> <MP7: annotation> <Who>Fernado Morientes</Who> < WhatAction >Spain vs. Sweden soccer match </ WhatAction> </MP7: annotation> </img> <img id="Image2" dur=2s /> </seq>
![Page 74: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/74.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
74
Outline of contents
• Introduction• Basic Components• Content Description• Audiovisual (AV) Descriptions• Multimedia Description Schemes• XM and Applications• More Information
![Page 75: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/75.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
75
MPEG-7 and other Standards
• MPEG-1, -2, and -4 are designed to represent the information itself, while MPEG-7 is meant to represent information about the information.
• MPEG-1, -2, and -4 make content available, while MPEG-7 allows you to find the content you need.
![Page 76: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/76.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
76
Ultimate ambition of MPEG-7
• To make the web as searchable for multimedia content as it is searchable for text today
• To improve the use of computer systems as easy as possible
![Page 77: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/77.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
77
MPEG-7 beyond
• To mould computers around human requirements and not humans around computer requirements
• To enable content disclosure based on facts, rather than on human annotations
• To find information by rich spoken queries, hand-drawn images and address what most people expect computers to be able to do
![Page 78: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/78.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
78
More Information on WWW
• Major MPEG-7 documents
http://www.cselt.it/mpeg/, semi-official website
http://www.mpeg-7.com, official website
• Others
http://www.elsevier.com/locate/image
![Page 79: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/79.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
79
Conclusion
AV contents
Structures
Features
Ds
DSs
DDL Ds, DSs
User
![Page 80: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/80.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
80
ThankThankss
![Page 81: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/81.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
81
![Page 82: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/82.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
82
Low level AV descriptors
Video segments•Color •Camera motion •Motion activity •Mosaic
Moving regions•Color •Motion trajectory•Parametric motion•Spatio-temporal shape
Still regions
•Color •Shape •Position •Texture
Audio segments
•Spoken content •Spectral feature•Timbre
![Page 83: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/83.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
83
Face Recognition Descriptor
• Projection of a face vector onto a set of basis vectors (face patterns)
• Feature set is extracted from a normalized face image
• Normalized face image– 56 lines with 46 intensity values in each line– The centers of the two eyes are located on the
24th row and the 16th and 31st column for the right and left eye respectively
![Page 84: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/84.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
84
Segment Decomposition
![Page 85: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/85.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
85
MPEG-7 Normative Interfaces
![Page 86: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/86.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
86
Example: Content description
MPEG-7MPEG-7DatabaseDatabase
IndexingIndexingFea extracFea extrac
SearchSearchretrievalretrieval
High levelHigh levelprocessprocess
Low levelLow levelprocessprocess
![Page 87: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/87.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
87
Segment DS Segment DS describes the result of a spatial, temporal, or spatio-temporal partitioning of the AV content. It has nine major subclasses:
• Multimedia Segment DS• AudioVisual Region DS• AudioVisual Segment DS• Audio Segment DS• Still Region DS• Still Region 3D DS• Moving Region DS• Video Segment DS • Ink Segment DS
![Page 88: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/88.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
88
Examples: T/S segments
![Page 89: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/89.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
89
Example: Segment trees
![Page 90: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/90.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
90
Illus of conceptual description
Object DS
Event DS
Concept DS
Semantic state DS
Semantic place DS
Semantic time DSAV content
Semantic DS
Semantic container DS
Semantic base DS
![Page 91: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/91.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
91
Visual description
• Basic structures– Grid layout, Time series, Multiple view,
Spatial 2D coordinates, Temporal interpolation
• Descriptors– Color, Texture, Shape, Motion, Localization
![Page 92: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/92.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
92
Example: Color Descriptors
• Color space
• Color Quantization
• Dominant Colors
• Scalable Color
• Color Layout
• Color-Structure
• GoF/GoP Color
![Page 93: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/93.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
93
Example: Color space
• R,G,B
• Y,Cr,Cb
• H,S,V
• HMMD
• Linear transformation matrix with reference to R, G, B
• Monochrome
![Page 94: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/94.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
94
Audio Framework
![Page 95: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/95.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
95
Descriptor
• Definition A Descriptor (D) is a representation of a Feature. A Descriptor defines the syntax and the semantics of the Feature representation. • Notes A descriptor allows an evaluation of the corresponding feature via the descriptor value. It is possible to have several descriptors representing a single feature. • Examples For example for the color feature, possible descriptors are: the color histogram, the average of the frequency components, the motion field, the text of the title, etc.
![Page 96: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/96.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
96
Descriptor Value
• Definition A Descriptor Value is an instantiation of a Descriptor for a given data set (or subset thereof).
• Notes Descriptor Values are combined via the mechanism of a Description Scheme to form a Description.
![Page 97: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/97.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
97
Description Scheme
• Definition A Description Scheme (DS) specifies the structure and semantics of the relationships between its components, which may be both Descriptors and Description Schemes.• Examples A movie, structured as scenes and shots, including some textual descriptors at the scene level, and color, motion and some audio descriptors at the shot level. • Note Ds contain only basic data types, and does not refer to others D or DSs.
![Page 98: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/98.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
98
DS: XML Scheme & Extensions
• XML Scheme• Data types • Simple and Complex types • Elements • Inheritance, Abstract types
• MPEG-7 extensions• Array and Matrix datatype • Enumerated datatypes for MimeType, CountryCode, RegionCode, CurrencyCode and CharacterSetCode • Typed references
![Page 99: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/99.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
99
Basic elements of DS
• Constructs for linking media files
• Localizing pieces of content
• Describing – time, places, persons, individuals, groups,
organizations, and textual annotation, etc– Who? What object? What action? Where?
When? Why? and How?
![Page 100: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/100.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
100
Content recognition tools
• No speech or face or gesture recognition engines included in MPEG-7
• Content recognition tools is a task for industries, not a standard– coding tools in MPEG-1, -2, -4 were for
research purposes, not part of the standard– no tools were part of the MPEG standard
![Page 101: Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649c9b5503460f94958cfc/html5/thumbnails/101.jpg)
Chinese Academy of Sciences, Beijing, China
Speech and Language Processing Techniques
Report
Docum
ent
101