techniques for information searching and retrieval of web-based multimedia digital library presented...
Post on 20-Dec-2015
216 views
TRANSCRIPT
![Page 1: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/1.jpg)
Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library
Presented by:Presented by: Vincent CheungVincent Cheung
Supervisors: Supervisors: Prof. Michael LyuProf. Michael Lyu
Prof. K.W. NgProf. K.W. Ng
Markers:Markers: Prof. K. H. LeeProf. K. H. Lee
Prof. Y. S. MoonProf. Y. S. Moon
3 3 May 2000May 2000
![Page 2: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/2.jpg)
Abstract Digital Library is getting more and more popular, Digital Library is getting more and more popular,
due to its strength in searching and retrieving due to its strength in searching and retrieving information.information.
Web-based environment provides a better media Web-based environment provides a better media for information sharing. for information sharing.
The trend that more multimedia information are The trend that more multimedia information are needed to be stored instead of pure text.needed to be stored instead of pure text.
Research on the techniques for multimedia Research on the techniques for multimedia information searching and retrieval in a web-based information searching and retrieval in a web-based digital library.digital library.
![Page 3: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/3.jpg)
Presentation Outline
XML overviewXML overview Data structures for multimedia news archivesData structures for multimedia news archives
for video clipsfor video clips using graph structures of XMLusing graph structures of XML giving annotationgiving annotation
Architecture and agents of digital libraryArchitecture and agents of digital library Research plan and conclusionResearch plan and conclusion
![Page 4: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/4.jpg)
Overview of XML
XML - eXtensible Markup LanguageXML - eXtensible Markup Language Proposed by WWW Consortium, in 1998Proposed by WWW Consortium, in 1998 To define a complete, platform-independent To define a complete, platform-independent
and system-independent environment for and system-independent environment for the authoring and delivery of information the authoring and delivery of information resources across the web.resources across the web.
SemistructuredSemistructured
![Page 5: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/5.jpg)
How XML differs from HTML
Extensibility - new tags may be defined at Extensibility - new tags may be defined at willwill
Structure - XMLStructure - XML Structures can be nested to Structures can be nested to arbitrary deptharbitrary depth
Validation - An XML document can Validation - An XML document can contain an optional description of its contain an optional description of its grammargrammar
![Page 6: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/6.jpg)
XML Documents use elements and attributes to describe your use elements and attributes to describe your
documentdocument<database><news>
<date year = “2000” month = “4” day = “15”/><title>Press warning appropriate, says Beijing</title><reporter>Kong Lai-fan</reporter>
<reporter>Greg Torode</reporter><content>Beijing yesterday defended remarks made by senior SAR-based official Wang Fengchao that local media should avoid reporting separatist views.</content>
</news><news>
. . .</news></database>
database
newsnews
date title contentreporter
![Page 7: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/7.jpg)
Document Type Definition providing the definition of a document type, providing the definition of a document type,
for member documents to followfor member documents to follow
<!DOCTYPE database [<!ELEMENT database (news*)><!ELEMENT news (date,title,reporter*,content)><!ELEMENT date year CDATA #REQUIRED
month CDATA#REQUIRED
day CDATA#REQUIRED>
<!ELEMENT title (#PCDATA)><!ELEMENT reporter (#PCDATA)><!ELEMENT content (#PCDATA)>
]>
![Page 8: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/8.jpg)
Data Structure for News Videos Multimedia presentationMultimedia presentation Graph structure propertyGraph structure property
keyword directorykeyword directory thesaurus / classification directorythesaurus / classification directory person / place directoryperson / place directory Chinese-English dictionaryChinese-English dictionary
Semistructure propertySemistructure property annotationannotation
![Page 9: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/9.jpg)
Indexing a Video Segment the video hierarchically into scenes. (A
video is composed of one or more related scenes.) Describe the complete news video using
bibliographic information (title, source, reporters, and abstract, etc…) plus format, duration, etc.
Describe each scene – id, start frame (time), end frame (time), keyframe, and scripts.
A OCR tools is implemented for indexing the videos in last semester.
![Page 10: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/10.jpg)
Indexing a Video
For a news clip:
id = 1234
title = N. T. swamped after torrential downpour
date = 1999-9-9
source = Hone Kong ATV
reporter = Chan Tai Man
abstract = Large areas of the northwest New Territories were under water yesterday as torrential rain swept across the SAR.
duration = 2:34:56
has_scene = 1234.1, 1234.2, 1234.3
format = MPEG
language = Cantonese
identifier = http://www.cse.cuhk.edu.hk/1.mpg”
![Page 11: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/11.jpg)
Indexing a Video
For a scene:
id = 1234.1
belong_to = 1234
next_scene = 1234.2
prev_scene = null
start_time = 0:0:00
end_time = 0:30:45
keyframe = 1238
transcrpt = . . .
![Page 12: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/12.jpg)
In NewsDatabase.XML:
<database> <news> <date><year>2000</year><month>4</month><day>15</day> </date> <title>N.T.swamped after torrential downpour</title>
<content>Large areas of the northwest New Territories were under water yesterday as torrential rain swept across the SAR. </content> </news>
. . .</database>
Sample News Entry
![Page 13: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/13.jpg)
Keyword Directory
Each news has its own keyword elementsEach news has its own keyword elements
Build a keyword directory containing all Build a keyword directory containing all keywordskeywords
Every keyword points to the news that Every keyword points to the news that having the same keywordhaving the same keyword
![Page 14: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/14.jpg)
Clifford LoN. T. swamped after torrential downpour
flood15 April, 2000
ID = 0010news
title date keyword reporter …
News Database is a tree structure
France fuel gunflood
0010 01370017
keyword keyword keywordkeyword
ID ID …ID
…
Keyword directory would be pointed by news entries, and also point to news entries.
ID = 0043ID = 0010 ID = 0017ID = 0015
database
news news news news …
Keywords point to news database again to for a graph structure
Keyword Directory
![Page 15: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/15.jpg)
In NewsDatabase.XML:
<database> <news ID=”0010”> <date><year>2000</year><month>4</month><day>15</day> </date> <title>N.T.swamped after torrential downpour</title> <keyword>flood</keyword> <keyword>storm</keyword> <content>Large areas of the northwest New Territories were under water yesterday as torrential rain swept across the SAR. </content> </news>
. . .</database>
Keyword Directory
![Page 16: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/16.jpg)
In KeywordDirectory.XML:
<keyworddirectory>. . .<keyword word=”flood”>
<newsid>0010</newsid><newsid>0017</newsid><newsid>0137</newsid>. . .
</keyword>. . .
</keyworddirectory>
Keyword Directory
![Page 17: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/17.jpg)
To search for terms with similar meaning to the keyword
<thesaurus><item term = “organisation”>
<spelling>organization</spelling><similar>association</similar>
</term><item term = “World Trade Organization”>
<spelling>World Trade Organisation </spelling>
<abbreviation>WTO</abbreviation></item>. . .
<thesaurus>
Thesaurus/Classification Directory
![Page 18: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/18.jpg)
To search for subset terms of the given keyword
<thesaurus><item term = “organisation”>
<spelling>organization</spelling><similar>association</similar>
</term><item term = “disaster”>
<contains>flood</contains><contains>earthquake</contains><contains>fire</contains><contains>storm</contains>
</item><item term = “flood”>
<belongs>disaster</belongs> </item>
. . .<thesaurus>
Thesaurus/Classification Directory
![Page 19: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/19.jpg)
Web Search Engine
![Page 20: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/20.jpg)
Person Directory ( Person ID, name, newsid, …)
<person_directory><person id = “wangfengchao”><name><first>Fengchao</first><last>Wang</last></name><nationality>Chinese</nationality><organization> The central Government’s Liaison Office </organization><position>deputy director</position><newsid>0123</newsid> <newsid>0245</newsid> ...
</person>. . .
</person_directory>
Person / Place Directory
![Page 21: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/21.jpg)
In news database:<newsdatabase>
<news id = “0123”> <date year=“2000” month=“4” day=“15”/>
<title>Press warning appropriate, says Beijing </title> <reporter>Kong Lai-fan</reporter>
<content> Beijing yesterday defended remarks madeby senior SAR-based official <person id=“wangfengchao”> Wang Fengchao</person> that local media should avoid reporting separatist views. </content>
</news>. . .
</newsdatabase>
Person / Place Directory
![Page 22: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/22.jpg)
15 April, 2000 mediaPresswarning appropriate, says Beijing
ID = 0123news
title date keyword content …
Person
Wang Fengchao
John Tom RobertWang Fengchao
0123 03690246
person person personperson
ID ID …ID
…
Person directory would be pointed by news entries, and also point to news entries.
ID = 0258ID = 0123 ID = 0246ID = 0155
database
news news news news …
Person entries point to news database again to form a graph structure
Person / Place Directory
![Page 23: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/23.jpg)
Place Directory: category structure<place_directory>
<place_id=“china” class=“country”> <name>China</name> <newsid>5839</newsid> . . .
<have_places> <place_id>=“hongkong” class=“SAR”>
<name>Hong Kong</name><have_places>
<place id=“NT” class=“district”> <name>New Territories</name>
</place> . . . </have_places>
<newsid>0010</newsid> . . . </place> . . .
</have_places> ` </place>
</place_directory>
Person / Place Directory
![Page 24: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/24.jpg)
In news database: <newsdatabase>
<news id = “0010” place=“hongkong”><date year=“2000” month=“4” day=“15”/>
<title>N.T.swamped after torrential downpour </title> <reporter>Clifford Lo</reporter>
<content> Large areas of the northwest <place id=“NT”> New Territories</place> were under water
yesterday as torrential rain swept across the <place id=“hongkong”> SAR </place>. </content>
</news>. . .
</newsdatabase>
Person / Place Directory
![Page 25: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/25.jpg)
Chinese-English Dictionary Translate the keywords for searchingTranslate the keywords for searching We can have English to Chinese dictionary:We can have English to Chinese dictionary:
<e2cdict><e2cdict>
<english char = “f”><english char = “f”>
<english char = “l”><english char = “l”>
<english char = “o”><english char = “o”>
<english char = “o”><english char = “o”>
<english char = “d”><english char = “d”>
<chinese><chinese>氾濫氾濫 </</chinese>chinese>
<chinese><chinese>水災水災 </</chinese>chinese>
<chinese><chinese>洪水洪水 </</chinese>chinese>
. . .. . .
</english></english>
</english></english>
. . .. . .
</e2cdict></e2cdict>
![Page 26: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/26.jpg)
Chinese-English Dictionary We can have Chinese to English dictionary:We can have Chinese to English dictionary:
<c2edict><c2edict>
<chinese term = “<chinese term = “世世”” >>
<<chinese term = “chinese term = “ ”貿”貿 >>
<<english>WTO</english>english>WTO</english>
<english>World Trade <english>World Trade OrganizationOrganization
</english></english>
</chinese></chinese>
. . .. . .
</chinese></chinese>
. . .. . .
</c2edict></c2edict>
![Page 27: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/27.jpg)
Annotation
XML is semistructured!XML is semistructured! More flexibility in adding tags to contents.More flexibility in adding tags to contents. Add our tags to give annotation to the Add our tags to give annotation to the
strings to provide “meanings” to it.strings to provide “meanings” to it. Hence, more expressive queries can be Hence, more expressive queries can be
supported.supported.
![Page 28: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/28.jpg)
Annotation: example
<<content>content>
Radioactive coolant water leaked at a nuclear Radioactive coolant water leaked at a nuclear reactor reactor in western in western Japan Japan yesterday, but the yesterday, but the accident had no impact on the environment, the accident had no impact on the environment, the plant director said. "plant director said. "Today when the plant was Today when the plant was operating with its usual output, a worker found operating with its usual output, a worker found a small leak of primary coolant water from a a small leak of primary coolant water from a pipe of the No 2 reactorpipe of the No 2 reactor," said ," said Katsuhiko Katsuhiko TakahashiTakahashi..
</content></content>
We understand… but the system doesn’t…We understand… but the system doesn’t…
![Page 29: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/29.jpg)
Annotation: example
<<content>content>
<disaster nature=“radioactive” death=“0” <disaster nature=“radioactive” death=“0” injuried=“0”>Radioactive coolant water leaked at injuried=“0”>Radioactive coolant water leaked at a nuclear reactor</disaster>a nuclear reactor</disaster> in western in western <place <place id=“japan”> Japan </place>id=“japan”> Japan </place>yesterday, but the yesterday, but the accident had no impact on the environment, the accident had no impact on the environment, the plant director said. "plant director said. "<speech speaker="Katsuhiko <speech speaker="Katsuhiko Takahashi"> Today when the plant was operating Takahashi"> Today when the plant was operating with its usual output, a worker found a small with its usual output, a worker found a small leak of primary coolant water from a pipe of the leak of primary coolant water from a pipe of the No 2 reactor </speech>,"No 2 reactor </speech>," said said <person="Katsuhiko <person="Katsuhiko Takahashi">Katsuhiko Takahashi </person>Takahashi">Katsuhiko Takahashi </person>..
</content></content>
![Page 30: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/30.jpg)
Usage of Annotation
So, we can have queries like:So, we can have queries like: All the speeches from Zhu Rongji in last All the speeches from Zhu Rongji in last
monthmonth All storms which kill more than 200 peopleAll storms which kill more than 200 people
We can also make some links to give more We can also make some links to give more details to people or places, etc.details to people or places, etc.
![Page 31: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/31.jpg)
Architecture of Digital Library
Designing stores and query processors for semistructured data.
Traditional database systems use a client/server architecture.
Over the distributed environment has given rise to two new architectures, they are data warehouses and mediators.
Video servers will also be integrated to our system to provide video streaming.
![Page 32: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/32.jpg)
Data Warehouse
datadata
updateupdate update
client client client
warehouse
data
serverdata
serverdata
serverdata
data
answer
query
![Page 33: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/33.jpg)
Mediator
answeranswer
query
query
client client client
mediator
serverdata
serverdata
serverdata
answer
answerquery
query
![Page 34: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/34.jpg)
Agents Using Structured Data
Larger demands for more structured data than loosely structured HTML.
Using semistructured XML data can provide a very good environment for Web agents.
Our main aim of implementing our agent is to illustrate that our semistructured XML data can provide a better environment for an agent to work.
![Page 35: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/35.jpg)
Research Plan & Conclusion Design of the structure in XML semistructured Design of the structure in XML semistructured
formatformat to support multimedia data, multilingual data, to support multimedia data, multilingual data,
and various kind of retrieval. and various kind of retrieval. Architecture of the system that allows multiple
sources of data. Implementing an agent is to illustrate that our
semistructured data can provide a better environment for an agent to work.
![Page 36: Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d405503460f94a1a981/html5/thumbnails/36.jpg)
Q & A Session