serbia- f orum cultural heritage digitization project with emphasis on semantic indexing

47
1/47 Serbia-Forum Cultural Heritage Digitization Project with Emphasis on Semantic Indexing Aleksandar Mihajlović Vladisav Jelisavčić Bojan Marinković Zoran Ognjanović Veljko Milutinović Zoran Marković Mathematical Institute of the Serbian Academy of Arts and Sciences Knez Mihailova 36, 11000 Belgrade, Serbia

Upload: elyse

Post on 22-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Serbia- F orum Cultural Heritage Digitization Project with Emphasis on Semantic Indexing. Aleksandar Mihajlović Vladisav Jelisavčić Bojan Marinković Zoran Ognjanović Veljko Milutinović Zoran Markovi ć Mathematical Institute of the Serbian Academy of Arts and Sciences - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

1/47

Serbia-ForumCultural Heritage Digitization Project with Emphasis on Semantic Indexing

Aleksandar Mihajlović Vladisav JelisavčićBojan MarinkovićZoran OgnjanovićVeljko Milutinović

Zoran Marković

Mathematical Institute of the Serbian Academy of Arts and Sciences

Knez Mihailova 36, 11000 Belgrade, Serbia

Page 2: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

2/47

What is Serbia-Forum?• Two characters– Credible encyclopedic articles written by credible authors

(similar to already present e-encyclopedias)– Digitization of cultural and national heritage

• Books• Archive documents• Images of culturally significant Serbian works of Art• 3D scanning of culturally significant places

– Churches– Monasteries– Museums– Homes of famous Serbs

• Etc• (quick digression on the next slide)

Page 3: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

3/47

Motivation: Why Serbia- Forum?

• Research– Semantic searching

• Semantic text and image search

• Presence– Credible source of information about cultural heritage of Serbia– Free, centralized and easily accessible

• Preservation– Prolong the life of articles of cultural heritage for the generations

to come– Back up of historically and culturally significant documents

• Natural disasters• War • Examples on the next slide

Page 4: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

4/47

Motivation for Digitization• Why is “Preservation” a top priority?

Page 5: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

5/47

What is Serbia-Forum?• Axioms holding Serbia-Forum together– Primary axioms (fixed to the serbia-forum project):• Content is selected and controlled by government funded and

owned cultural and academic institutions• Each document is copy protected by a different license • Quality NOT quantity

– Secondary axioms (implementable by other projects):• Semantic search• Version tracking of every document• Information about the author of each document is supplied

(biography)

Page 6: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

6/47

What already exists• Wikipedia• Europeana• Austria-Forum . . . Europaea-Forum• …• Why Serbia-Forum?

Page 7: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

7/47

Wikipedia

Page 8: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

8/47

Wikipedia• One product of Wikimedia

Page 9: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

9/47

Wikipedia• A broad collection of written knowledge in the

form of articles (knowledge concerning the whole world)

Page 10: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

10/47

Wikipedia• A broad collection of written knowledge in the

form of articles (knowledge concerning the whole world)– Free encyclopedia– Distributed organization • Product localized by “lingual” regions

– Wikipedia Serbia, Wikipedia Poland, Wikipedia Italy itd…

Page 11: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

11/47

Wikipedia

Page 12: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

12/47

Wikipedia• Wikipedia is based on free authorship merits of

the “CC license” (Creative Commons license)– Every article written for wikipedia may be used,

printed, sold and changed freely without breaking the law.

– One of the main factors for the success and expansion of Wikipedia

– Every user can write an article about any topic – Every article is apt to changes

Page 13: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

13/47

Wikipedia• Articles are available in many languages – Wikipedia in English contains about 3 500 000 articles,– Wikipedia Germany about 1 385 000, – Wikipedia Spain about 880 000, etc…– Wikipedia Serbia contains over 156 000 articles in Serbian.

Page 14: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

14/47

Page 15: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

15/47

Wikipedia• Even if there exist a significant number of documents in

Serbian, a significantly smaller number of articles represent concepts related to Serbian cultural heritage.

• In the midst of insufficient representation, a need arises for the systematic collection and presentation of concepts, significant historical figures and events in Serbian history and culture.

• One solution is: http://wwww.serbia-forum.org

Page 16: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

16/47

Wikipedia• Strategic constraints of Wikipedia (1/2)

– Encyclopedia of knowledge concerning the whole world, not only Serbia • Serbia and Wikipedia are connected by only one string: The Serbian language

– Tracking article changes is a hassle – There is only one valid license: “Creative Commons”

• Which makes thing a bit inflexible – The free authorship approach

sometimes doesn’t yield satisfactory results (credibility)• No editorial board to determine credibility of author and article

– Articles can contain stereotypes, false truths and biased information

Page 17: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

17/47

Wikipedia• Strategic constraint of Wikipedia (2/2)– Contains only encyclopedic articles;

• Weak assortment of books• Weak assortment of documents • No archive contents

– Wikimedia Commons? Wikisource? Wikibooks?• Repeat: Only one valid license: “Creative Commons” • Out of context

Page 18: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

18/47

Europeana• Collection of information concerning European cultural

heritage – Access to millions of books, pictures, museum pieces, movies

and archive data– It’s not an encyclopedia

• Under the supervision of the Europeana foundation– Over 2000 institutions all over Europe – Every institution individually is responsible for the selection and

presentation of its contents– Contribution is exclusively reserved for institutions

Page 19: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

19/47

Page 20: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

20/47

Europeana• Advantages?– Rights are protected– Credibility is ensured– Content is diverse

• Disadvantages?– It is just a portal– Collection of documents vs encyclopedia

Page 21: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

21/47

Austria Forum

Page 22: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

22/47

Austria-Forum• Collection of knowledge about Austria

Page 23: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

23/47

Austria-Forum• Over 20.000 units of content

Page 24: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

24/47

Austria-Forum• Over 20.000 units of content

– Indexing of content

Page 25: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

25/47

Austria-Forum• Over 20.000 units of content

– Indexing of content – Biografies of the most renown Austrians

Page 26: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

26/47

Austria-Forum• Biographies

Page 27: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

27/47

Austria-Forum• Over 20.000 units of content

– Indexing of content – Biografies– Homeland Lexicon “Heimatlexikon”

• Popular themes are depicted through short films

Page 28: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

28/47

Austria-Forum• Homeland Lexicon“Heimatlexikon”

Page 29: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

29/47

Austria-Forum• Over 20.000 units of content

– Indexing of content – Biografies– Homeland Lexicon “Heimatlexikon”– Web books Austria

• Digitized books

Page 30: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

30/47

Austria-Forum• Web books Austria

Page 31: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

31/47

Austria-Forum• Over 20.000 units of content

– Indexing of content – Biografies– Homeland Lexicon “Heimatlexikon”– Web books Austria

• Digitized books • Web books internet application for reading digitized books

Page 32: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

32/47

Austria-Forum• Web books Austria

Page 33: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

33/47

Austria-Forum• Over 20.000 units of content

– Indexing of content – Biografies– Homeland Lexicon “Heimatlexikon”– Web books Austria – Austria-Forum Society

• Every member can make his/her own personal homepage and can personally contribute written articles to the Austria-Forum article collection

• Every member can contribute to the development of a single article – Similar to wikipedia

• Every change made in the article is documented, thus all changes can be tracked

Page 34: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

34/47

Serbia-Forum

http://www.serbia-forum.org

Page 35: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

35/47

Serbia-Forum• “The first and unofficial” version of the web

presentation of “Serbia-Forum” is already online .

Page 36: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

36/47

Serbia-Forum

Page 37: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

37/47

Serbia-Forum• Current content (current state)– Digitized documents – Digitized books– Photo gallery– Articles underway• Selected authors• Authors from various trusted societies & organizations

Page 38: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

38/47

Serbia-Forum• Digitized documents

Page 39: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

39/47

Serbia-Forum• Digitized books

Page 40: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

40/47

Serbia-Forum• Photo gallery

Page 41: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

41/47

Serbia-Forum• Semantic indexing of content (1/2)– Smart text content searching• Broad search queries lead to specific results

1850 – 1860

Austro-Hungary

New York, 1943

Nikola Tesla

Born 1856 in Austro-HungaryDied in New York City in 1943

Page 42: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

42/47

Serbia-Forum• Semantic indexing of content (2/2)– Smart image content searching• Correlating an image to similar images

General Pavle Jurišić Sturm

Marshall Josip Broz Tito

Page 43: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

43/47

Serbia-Forum• Articles

Page 44: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

44/47

Serbia-Forum• Primary character– Content is under the supervision of credible government

institutions whose purpose is to preserve and all aspects of heritage in Serbia

– Infrastructure which will dictate how each document will be protected and by which license(Cultural heritage is rich in content that cannot be covered by the CC license)

– Quality and not quantity (small number of pearls)– Translation is adapted to the region of the user

Page 45: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

45/47

Serbia-Forum• Secondary character– Semantic search – Version tracking of each document– Information about each article author(s) is available– Editorial board

Page 46: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

46/47

Active Participating Institutions in Serbia

• Archives of Serbia • SANU (Serbian Academy of Arts and Science)• National Library of Serbia• Historical Archive of Belgrade• Filological Faculty of the University of Belgrade, Serbia• Faculty of Political Science of the University of

Belgrade, Serbia• And more…..

Page 47: Serbia- F orum Cultural Heritage Digitization Project  with Emphasis on Semantic Indexing

47/47

www.serbia-forum.org

Aleksandar Dedic Decembar 2010