term paper - devansh mathur

Author: parth-shukla

Post on 04-Apr-2018




0 download

Embed Size (px)


  • 7/31/2019 Term Paper - Devansh Mathur


  • 7/31/2019 Term Paper - Devansh Mathur




    This is to certify that Devansh Mathur student of B. Tech. in Computer Science

    & Technology has carried out the work presented in the project of the Term

    paper entitled Semantic Digital library as a part of Second Year programme

    of Bachelor of Technology in of B. Tech. in Computer Science & Technology

    from Amity School of Engineering and Technology, Amity University

    Rajasthan, under my supervision.

    DATE: Mr Sanjay Jain

    Faculty Guide


  • 7/31/2019 Term Paper - Devansh Mathur


  • 7/31/2019 Term Paper - Devansh Mathur




    Abstract.. 5

    List of Figures. 6

    1. Introduction.. 72. The Use of semantic web in digital libraries.. 10

    2.1Resource Development Framework 112.2Web Ontology Language. 13

    3. Digital Libraries: Earlier times.. 184. Digital Libraries Evolution: Content Sharing.. 235. The Digital Library Universe: .. .. 29

    5.1.A Three-tier Framework.. 295.2.Main concept... 315.3.The Main Roles of Actors. 38

    6. Advantages and Disadvantages of Libraries: From Past Till Now... 446.1.Of Past Digital Libraries.. 446.2. Of DL to SDL. 456.3.Of SDL to SSDL.. 476.4.SSDL and the future. 48

    7. Future Prospects 498. Existing Semantic Digital Library Systems. 52

    8.1. SMILE.. 528.2.JeromeDL. 528.3.BRICKS53

    9. Conclusion. 5410.References..55

  • 7/31/2019 Term Paper - Devansh Mathur


  • 7/31/2019 Term Paper - Devansh Mathur


  • 7/31/2019 Term Paper - Devansh Mathur




    Typical digital libraries usually focus on categorizing and cataloguing

    resources. Information retrieval in such libraries relies primarily on text search

    engines and free browsing. This approach proved to be useful, however it

    suffers from ambiguity of natural language, neglecting the importance of

    metadata; it also does not engage users in the process of sharing knowledge.Simple searching still returns too many results which have to be filtered

    somehow. Page ranking algorithm helps with websites but cannot be easily

    applied to books or e-learning objects. On the other hand, having a look on a

    friends bookshelf can give us much clearer view on what is worth reading in a

    particular domain than digging through a thousand books or websites published

    this month. The semantic digital library is an attempt to restore the collaborative

    approach to sharing knowledge.

    The term Digital Library is currently used to refer to systems that are

    heterogeneous in scope and yield very different functionality. These systems

    range from digital object and metadata repositories, reference-linking systems,

    archives, and content administration systems (mainly developed by industry) to

    complex systems that integrate advanced digital library services (mainly

    developed in research environments). This overloading of the term Digital

    Library is a consequence of the fact that as yet there is no agreement on what

    Digital Libraries are and what functionality is associated with them. This results

    in a lack of interoperability and reuse of both content and technologies. This

  • 7/31/2019 Term Paper - Devansh Mathur



    document attempts to put some order in the field for the benefit of its future


    Libraries, together with archives, have always been the primary institutions

    delegated to manage collect, preserve and diffuse human knowledge and

    culture. When advances in computer science allowed dealing with digital

    representation of documents dedicated to capture human knowledge and culture

    rather than printed ones, libraries were particularly involved in exploiting the

    potential of the digital revolution. Thus digital libraries soon became the term

    to indicate the digital counterpart of traditional libraries. However, digital

    library systems have greatly evolved since their early appearance. Today they

    have become complex networked systems able to support communication and

    collaboration among different worldwide distributed communities, dealing with

    digital objects comprising not only the digital counterpart of printed

    documents, but also images, video, programs and any other kind of multimedia

    objects a community may define as appropriate to its working andcommunication needs. The evolution of digital libraries (DLs) has not been

    linear, coming from the contribution of many disciplines. This has created

    several conceptions of what a DL is, each one influenced by the perspective of

    the primary discipline of the conceiver(s) or by the concrete needs it was

    designed to satisfy.

    As a natural consequence, the history of Digital Libraries, which is nowapproximately twenty years long, is the history of a variety of different types of

    information systems that have been called digital libraries. These systems are

    very heterogeneous in scope and functionality and their evolution does not

    follow a single path. In particular, when changes happened this has not only

    meant that a better quality system was been conceived superseding the

  • 7/31/2019 Term Paper - Devansh Mathur



    preceding ones but also meant that a new conception of digital libraries was

    born corresponding to new raised needs. As it will be seen, most of the systems

    dealt with in this history are still living in their original conception, even though

    not in their original technological solutions.

    The rest of this chapter goes back over this history, giving an account of past

    and present understanding of these kinds of systems and on-going work in the

    area. The chapter concludes with a vision of the impact that new DLs are

    expected to have in the near future.

  • 7/31/2019 Term Paper - Devansh Mathur



    The Use of Semantic Web in Digital Libraries

    The Semantic Web is a concept that was first introduced by Tim Berners-Lee.

    Berners-Lee, Hendler, & Lassilas (2001) vision of the Semantic Web is that it

    will bring structure to the meaningful content of Web pages, creating an

    environment where software agents roaming from page to page can readily

    carry out sophisticated tasks for users . This vision is supported by the

    combination of metadata schemes, the use of the Resource Description

    Framework (RDF), and the creation of ontologies. The ultimate goal of the

    Semantic Web is to allow machines to understand the meaning of digital

    objects, rather than just the key words used to describe them. This will

    revolutionize the search and retrieval of digital objects, a key function for

    digital libraries.

    One of the main benefits of the Semantic Web is that it operates within the

    current Web environment to add logic and meaning to digital objects. In their

    review of an experiment to enhance the semantic interoperability of two digital

    collections, Angjeli and Isaac (2009) state One of the key ideas of the semantic

    web approach is to open and interconnect the meaning capitalized within

    existing metadata. This idea is accomplished through the use of a combination

    of XML, the Resource Description Framework (RDF), and the creation of

    ontologies. XML or eXtensible Markup Language is a computer language that

    allows users to create their own tags to annotate Web pages.

    Many digital objects contain metadata, or information contained within the

    document that defines information about the document such as the author,

    publisher and date created. This metadata is often written using XML which

  • 7/31/2019 Term Paper - Devansh Mathur



    gives digital objects a similar framework, regardless of which metadata scheme

    is used within the object. Although XML can define structure within a digital

    object, it does not convey meaning to computers.

    Figure 1 semantic web stack

    Resource Description Framework (RDF)

    The next step in the Semantic Web process is to add meaning to the XML data

    found within digital objects. This is accomplished through the creation of RDF

    models and schemas.

    RDF models are composed of three parts with the basic form of subject-

    predicate-object, and are often referred to as triplets. The subject is a resource

    which can essentially be any digital object. The predicate is a property, such as

    has author which signifies that the resource has an author. The object can be

    another property or a class of that property, in our example this could be

    Jones. The complete RDF model would be Object A has author Jones. This

    designates to the computer that the object was authored by Jones.

  • 7/31/2019 Term Paper - Devansh Mathur



    Each part of the triplet can be assigned a unique universal resource identifier

    (URI), which directs the computer, to that specific digital object. As pieces of

    information are added to this data model it can be written in XML andtransmitted across applications. The RDF schema is the set of rules that define

    the way RDF document must be written.

    The benefits of using RDF are tremendous. As McCarthie Nevile and Mendez

    state, If you give an RDF processor two sets of statements about the same

    thing (a thing identified by the same URI), it just treats them as one collection

    of statements. This means that a web of meaning can be created about aspecific object, and that this web of meaning can have multiple contributors.


    Student Researcher




    hasSuperVisorDomain Range



    Has Supervisor

  • 7/31/2019 Term Paper - Devansh Mathur



    Web Ontology Language

    The final piece of the Semantic Web is the use of ontologies. Ontologies

    serve as the backbone of the Semantic Web by providing vocabularies and

    formal conceptualization of a given domain to facilitate information sharing and

    exchange . Ontologies are basically sets of logical rules that define the

    relationships between sets of concepts. Ontologies can also be used to define the

    concepts and the XML codes used to designate them within Web pages.

    Ontologies are usually domain specific, and attempt to exhaustively define

    concepts and map their relationships. The goal of ontology is to reinforce the

    formal logic and tighten the meaning to the point where it can no longer

    escape correct computer interpretation . The combination of XML, RDF, and

    ontologies makes it possible for computer programs and agents to interpret

    digital objects based on their meaning and their relationship to other concepts.

    The benefits of using these technologies in digital libraries are myriad.

    Furthermore, digital librarians already possess a skill set that enables them to

    work with Semantic Web technology. Digital librarians are already working

    with XML to create metadata, and the creation of ontologies is akin to the

    development of a thesaurus.

    There are many digital library projects already making use of these

    technologies. One of the greatest benefits of the Semantic Web is the ability to

    relate resources by their meaning rather than their form. This allows for the

    linking of concepts from different vocabularies, freeing users from the confines

    of any particular vocabulary.

  • 7/31/2019 Term Paper - Devansh Mathur



    The best example of the use of ontologies to achieve this goal is the Unified

    Medical Language System (UMLS) developed by the National Library of

    Medicine. The UMLS is comprised of three parts: a Meta thesaurus which

    contains over one million biomedical concepts from over 100 source

    vocabularies; a Semantic Network which defines 133 broad categories and

    fifty-four relationships between categories for labelling the biomedical

    domain; and a SPECIALIST Lexicon & Lexical Tools which provide lexical

    information and programs for language processing (National Library of

    Medicine, 2010).

    The UMLS is distributed with software and tools that enable a user or

    organization to use the UMLS to do research in the biomedical field. The

    UMLS is essentially a comprehensive ontology of the biomedical field, and can

    be used by a digital library who serves users with an interest in the biomedical

    field. The UMLS is the perfect example of the potential for interoperability

    created by using Semantic Web technologies. The National Library of Medicine

    is arguably the well-respected source of biomedical resources, and their UMLScan be used by any single user or digital library to enhance the quality of their

    searching or to build a search and retrieval interface.

    Another example of a collaborative digital library project using Semantic Web

    technology is Europeana , an interface for searching the digital resources of

    Europe's museums, libraries, archives and audio-visual collections

    (Europeana). Europeana allows users to explore these items and share thoseusing social media outlets. Europeana has also developed a Semantic Searching

    Prototype which currently contains items from the Rijksmuseum, the Louvre,

    and Netherlands Institute for Art History. The Semantic Search interface is a

    simple search bar that allows users to search for concepts represented by digital

    objects from these institutions.

  • 7/31/2019 Term Paper - Devansh Mathur



    Performing a search for love returns 504 items that are listed under the

    following headings: works showing concept, works titled, works showing a

    more specific concept, works showing a related concept, works from type a

    related concept, and other. This breakdown gives the user a unique perspective

    on how a Semantic Web search differs from a traditional search. When the user

    clicks on a resource, a window pops up that gives the user information about the

    resource including date, material, relation, subject, title, and type. These fields

    correspond to many metadata schemes. The user can also link to the page from

    the original institution that contains the item, and to a Europeana created page

    for that item. The Europeana page allows users to download RDF files relating

    to specific data about the item. The Europeana Semantic Search Prototype

    shows how Semantic Web technology can be used to integrate resources from

    multiple sources, to allow users to access these resources, and to facilitate the

    sharing of Semantic Web data for each resource.

    There are also many Semantic Web resources that are not directly connectedwith a traditional digital library that can potentially be used in digital library

    interfaces. One such resource is Friend of a Friend Project or FOAF. This

    project provides users with a way to create an RDF file that identifies the user

    as a unique individual. This project is similar to the authority files created in

    traditional libraries to differentiate between authors with the same name. FOAF

    files can designate a wealth of information about a person. They can begenerated using XML/RDF by users or can be generated through social

    networking sites. Digital libraries could allow users to create profiles, and then

    translate these profiles to FOAF files for use within the digital library interface.

    Digital libraries should provide users with services beyond simple search and

    retrieval of items, and allowing users to customize their interface and track

  • 7/31/2019 Term Paper - Devansh Mathur



    search history is one way to accomplish this goal. A 2009 study by Jiang and

    Tan attempted to use Semantic Web technology to create user ontologies to help

    provide personalized information services. Jiang and Tan (2009) explain that

    user ontology is a specialization of domain ontology by assigning each concept

    and relation of the domain ontology with a specific value for indicating a users


    This user ontology allows for ranking of retrieved search items by user

    preferences. The authors required participants to submit a search query to a

    system loaded with documents from the ACM digital library. The participants

    then were asked to browse the top 30 documents retrieved and rank them based

    on relevance.

    This data was loaded into that users ontology based on the concepts included in

    preferred and non-preferred documents. After the creation of the user ontology,

    the data was loaded into the search interface. The participants then searched for

    a new query, and the search system ranked the documents based on preferencesfrom the user ontology. These results were compared to a search query of the

    same system not loaded with the user ontology. The authors found that the

    system loaded with the user ontologies returned more precise results than the

    system that was not programmed for user preference. The authors state User

    ontology can be used in many ways to support Semantic Web applications,

    including document re-ranking, information filtering, and query expansion.

    The implications of this research are that digital libraries can employ Semantic

    Web technology to track user preferences in a new way, and can seamlessly

    integrate these preferences with current search capabilities.

  • 7/31/2019 Term Paper - Devansh Mathur



    Semantic Web technologies have been integrated with digital libraries to meet

    the needs of digital library users and to meet the goals of digital library

    organizations. The greatest asset of the Semantic Web is that it allows for

    interoperability between organizations and information systems, for even

    agents that were not expressly designed to work together can transfer data

    among themselves when the data comes with semantics. Another asset of the

    Semantic Web is that users can create information about digital objects using

    any language, classification scheme, or metadata scheme. Once this information

    has been linked to that objects DOI, the computer will integrate each piece of

    information scattered throughout the Web to create a more complete record of

    that object. 8 The Use of Semantic Web Technologies in Digital Libraries

    Digital librarians should be leading the charge for the creation of RDF schemas

    and ontologies for digital objects because they have a combination of traditional

    library skills such as content representation and cataloguing along with

    technological skills such as metadata creation. Digital libraries must use these

    technologies to improve all aspects of their organizations.

    The ideal digital library would use Semantic Web technology as the backbone

    of its entire organization. All digital objects contained within their collections

    would have RDF documents associated with them. The collections would all be

    pre-loaded with ontologies related to their specific domain. Digital libraries

    would contain resources from organizations around the world. Users would beable to log in and create profiles that would translate to FOAF files and user

    ontologies to keep track of semantic preferences. The search and retrieval

    functions would benefit from the ability to search semantically along with the

    traditional search methods the library currently employs. Semantic Web

    technology essentially teaches the computer the relationships between digital

  • 7/31/2019 Term Paper - Devansh Mathur



    objects, their meaning, and how users would like to interact with them. The

    technology is already in place and the ideal digital library is only a few

    keystrokes away.


    The digital library concept can be traced back to the famous papers of foreseer

    scientists like Vannevar Bush and J.C.R. Licklider identifying and pursuing the

    goal of innovative technologies and approaches toward knowledge sharing as

    fundamental instruments for progress. Bush (Bush, 1945) devised

    A device in which an individual stores all his books, records, and

    communications, and which is mechanized so that it may be consulted with

    exceeding speed and flexibility.

    Moreover, on top of it there is a transparent platen. On this are placed

    longhand notes, photographs, memoranda, all sorts of things. Because of the

    lack of digital support, he identified in improved microfilm the means for

    content storage and exchange: contents are purchased on microfilm ready for

    insertion. Books of all sorts, pictures, current periodicals, newspapers, are thus

    obtained and dropped into place. Of course, he envisaged also support forknowledge discovery (provision for consultation of the record by the usual

    scheme of indexing),access (to consult a certain book, he taps its code on the

    keyboard, and the title page of the book promptly appears before him) and

    management (new forms of encyclopaedias will appear, ready made with a

    mesh of associative trails running through them, ready to be dropped into the

  • 7/31/2019 Term Paper - Devansh Mathur



    memex and there amplified). Licklider realized thatcomputers were getting to

    be powerful enough tosupport the type of automated library systems thatBush

    had described and in 1965, wrote his bookabout how a computer could provide

    an automated library with simultaneous remote use by many different people

    through access to a common database. Because of this, Licklider is also

    considered a pioneer of Internet and in its book he established the connection

    between Internet and digital library. Thus, it is not surprising that research and

    development activity on digital libraries started in the early 1990s, with the

    Internet proliferation, and that Internet has created unprecedented possibilities

    to discover and deliver human knowledge. The first systems delivering

    knowledge artefacts in digital form can essentially be seen as archives of digital

    texts accessible through a search service and implemented by a centralized

    metadata catalogue.

    The early ones of such systems were constructed on a rather simple

    architecture, with the exception of very few cases. This worked to the advantage

    of their diffusion and adoption by different scientific communities. BesidesarXiv, significant examples of such early systems were archives of various type

    like Electronic Thesis and Dissertations repositories (ETDs), whose pilot project

    started in 1996 , and archives of cognitive sciences papers and of research

    papers in economics both launched in 1997. The former was a system which

    was offering services for submitting, browsing and searching electronic thesis in

    PDF format. The availability of this product stimulated the creation of theNetworked Digital Library of Theses and Dissertations international

    organization, still operational, which registers and keep track of ETDs. Cog

    Prints, was initially conceived as repository allowing the cognitive science

    community to self-archive their papers. It now contains more than 3,000

    artefacts starting from 1950. In 2000 it was made compliant with the protocol

  • 7/31/2019 Term Paper - Devansh Mathur



    defined by the Open Archives Initiative and then its software was converted

    into the EPrints Digital Repository Software (EPrints, (n.d.)), a flexible platform

    supporting easy and fast set up of repositories of open access research outputs.

    Because of its simplicity, EPrints is currently widely used, more than 250

    repositories declared to rely on it.

    Similarly, RePEc was initially conceived as an open repository of electronic

    papers in a specific domain. Thomas Krichel, principal investigator of the

    RePEc Project, in 1997 illustrated the principles underlying a new realised

    version of this system by affirming Distributed archivesshould offer metadata

    about digital objects (mainlyworking papers); the data from all archives should

    form one single logical database despite the fact that it should be held on

    different servers; users could access the data through many interfaces;

    providers of archives should offer their data to allinterfaces at the same time.

    Krichel, with these statements was anticipating a view that would have largely

    emerged few years later. These systems all still living in more recent and

    enhanced versionsrepresent very embryonic forms of digital libraries. In fact,their functionality is essentially confined to (self) publishing of simple

    information objects and discovery of these information objects through

    rudimentary search and browse facilities.

    The Digital Library Initiative (DLI) consisted of two major competitive fundingprograms, the first of which started in 1994 and funded six research projects

    (chosen among 73 proposals) over a four-year period (Schatz and Chen,1996)

    while the second phase was dedicated to extend the research carried out during

    the previous phase by including content providers thus to guarantee the

    availability of real test bed to validate research outcomes. However, the DLI

  • 7/31/2019 Term Paper - Devansh Mathur



    funded projects have not been the only ongoing efforts (CACM, 1995) even if

    they were very innovative because they focused on future technological

    problems. The six projects funded by DLI phase one were: the California

    Environmental Digital Library (Wilensky, 1995) focused on developing the

    technologies to access large, distributed collections of photographs, satellite

    images, videos, maps, documents, and multivalent documents and to support

    work-centred digital information services (Wilensky, 1996); the Alexandria

    Digital Library (Smith and Frew, 1995) focused on building an online,

    distributed digital library for geo-referenced1 information, including maps,

    aerial photographs, satellite imagery, and catalogue records, and on supporting

    geographically defined queries (Smith, 1996); the InformediaDigital Video

    Library (Christel, Kanade, Maudlin, et al., 1995) focused on establishing a

    large, online digital video collection with full-content and knowledge-based

    search and retrieval (Wactlar, Kanade, Smith and Stevens, 1996)

    Despite none of these systems exist anymore as a running service2, the

    solutions proposed, the technology developed as well as the resources collectedand built have been largely used by more complex DLs developed later. It is

    well known that one of the most important success stories resulting from these

    projects is Google. Page and Brin started working on their search engine while

    being PhD Students at Stanford working on the Stanford Digital Library Project.

    Actually, the Digital Library Initiative merits goes far beyond the specific workthat it funded and we can affirm that it gave shape to digital library as a new

    research discipline. Research in digital library topics was not new but it had

    been fragmented across many disciplines. This program led to conferences,

    publications and researcher, teams explicitly interested in doing research in

    digital libraries. Moreover, it gave directions to the overall movement toward a

  • 7/31/2019 Term Paper - Devansh Mathur



    practical research field.(Arms, 2001) As anticipated, in Europe the scene was

    characterised by the existence ofDELOSinitiatives.

    In addition, it was intended to develop new models for intelligent audio-

    visual content-based searching and film-sequence retrieval, new video

    abstracting tools, and user interfaces specifically tailored to the new

    functionality. The provision of multilingual services and cross language

    retrieval tools was also addressed. Another project, i.e. AnIntegrated Art

    Analysis and Navigation Environment(ARTISTE) (Allen, Vaccari and Presutti,

    2000), focused on giving providers, publishers, distributors, rights protectors

    and end users of art images information, as well as the multi-media information

    market as a whole, a more efficient system for storing, classifying, linking,

    matching and retrieving art images. This environment was providing, for

    example, automatic extraction of metadata based on iconography, painting style,

    etc; content-based navigation for art documents; distributed linking and

    searching across multiple archives allowing ownership of data to be retained;

    and storage of art images using large multimedia object relational databases.

    According to these principles: the functionality of a digital library system were

    available in the form of distinct functional units, each exposing its operational

    semantics through an open protocol; digital library systems are compositions of

    these functional units and new functionality can be added through the

    implementation of value-added services, which interact with existing othersusing established protocols; the components (and content) of a digital library

    could be spread over the global Internet, but should be presented to the user as a

    single system.

  • 7/31/2019 Term Paper - Devansh Mathur





    The construction of digital libraries similar to those just described was very

    resource-consuming since, for each new one, both the content and the software

    providing its functionality were built from scratch. At the end of the 1990s, the

    experiences of using distributed architectures to implement proper digital

    libraries and the proliferation of independent repositories of valuable content

    stimulated the idea of reusing content already collected (and curated) in existing

    independent repositories so as to reduce the effort to build large-scale digital

    libraries. However, many obstacles were to be solved to fully implement this

    solution. The major of them was certainly how to implement repository service

    interoperability, i.e. the capability of seamlessly accessing and using the content

    managed in distributed and heterogeneous repositories. Approaches based on

    cross-searching multiple archives based on a common protocol, such as

    Z39.5010,, (Miller, P., 1999) were considered at the time costly and hardly

    scalable. A very important meeting toward the interoperability of electronic

    repositories was organised in Santa Fe, New Mexico, on October 1999, with the

    goal to establish recommendations and mechanisms to facilitate cross-archive

    value-added services.

    This meeting led to the Santa Fe Conventiona combination of organizational

    principles and technical specifications to facilitate a minimal but potentially

  • 7/31/2019 Term Paper - Devansh Mathur



    highly functional level of interoperability among scholarly e-print archives

    and to the establishment of the Open Archives Initiative. (Van de Sompel, H.,

    Lagoze, C., 2000) The meeting started by discussing a concrete example of

    interoperability implemented through the UPS Prototype (Van de Sompel, H.,

    Krichel, T., Nelson, M.L., 2000) and recognising its potentialities. The UPS

    prototype demonstrated the integrated action of a variety of services operating

    over data originating from a set of archives. Each of those services provided a

    reasonably rich level of functionality (accessible through a set of protocol

    methods). The participants recognised that trying to reach consensus on the full

    functionality of the prototype was aiming too high and that a proper degree of

    modesty in the approach toward integration capable to balance the cost of

    participation with the need for adequate functionality was mandatory. The Santa

    Fe Convention identified two key roles in participating institutions: data

    providers and service providers. Data providers were in charge to handle the

    depositing and publishing of resources in a repository and expose for

    harvesting the metadata (what they called record) about resources in therepository. They were the creators and keepers of the metadata and repositories

    of resources. Service providers were in charge of harvesting metadata from data

    providers for the purpose of providing one or more services over the collected

    data. The types of services that might be offered included a search interface,

    peer-review system, etc. The cooperation between content and service providers

    was regulated by a protocol, initially defined as a subset of the Dienst protocoland nowadays known as the Open Archive Protocol for Metadata

    In the US, the National Science Foundation funded the National Science Digital

    Library(NSDL) (Zia, L.L., 2001) with the aim to provide organized access to

    high quality resources and tools that support innovations in teaching and

  • 7/31/2019 Term Paper - Devansh Mathur



    learning at all levels of science, technology, engineering, and mathematics


    These large-scale initiatives devoted to aggregate in a single place knowledge

    that is spread across a plethora of archives and systems will ever exist for a

    series of reasons including the existence of various (institutional) repositories

    and the ever growing multidisciplinary nature of our society. In particular, TEL

    and DARE anticipated important initiatives, namely, Europeana and DRIVER,

    respectively, which were launched few years later. Europeana14 is a Thematic

    Network funded by the European Commission under the eContentplus

    programme, as a part of the i2010 initiative15. Europeana began in July 2007.

    Originally known as the European digital library networkEDLnet it is the

    result of a partnership of 100 representatives of heritage and knowledge

    organisations and IT experts from throughout Europe. Objective of Europeana

    is to provide access to Europes cultural and scientific heritage through a cross-

    domain portal. The first Europeana prototype, launched in November 2008,

    provided simple search and retrieval facility on an information space ofapproximately two millions of digital objects selected from Europes museums,

    libraries, archives and audio-visual collections, harvested through the OAI-

    PMH protocol. The first production quality version of Europeana (called Rhine)

    will go live on July 2010, to be followed in April 2011 by a more sophisticate

    version (Danube), including more contents and offering a richer set of

    functionality. The intention is that by 2010 the Europeana portal will giveeverybody direct access to well over 6 million digital sounds, pictures, books,

    archival records and films. Moreover, Europeanas goal is to realize a system

    serving very different type of users. It should meet occasional curiosity of

    generic users as well as the information needs of school children and students. It

    should also provide academic students and teachers with certified information

  • 7/31/2019 Term Paper - Devansh Mathur



    and the possibility to export information for courses, as well as offer expert

    researchers and professional the possibility of searching, verifying and

    annotating information and using ad-hoc services. In the context established by

    Europeana, special types of providers are the aggregators, i.e. specialised DLs

    that act as collectors of content from other providers. For instance, Culture.fr is

    the largest aggregator, providing content from about 480 organizations in

    France, including the Louvre and the Muse dOrsay. The information resources

    that populate Europeanas information space are harvested as surrogates of the

    original objects that are located at content providers sites. Since surrogates may

    also contain elements of the original object (table of contents, full text index

    items, music and video abstraction etc.), the very interesting new feature

    of Europeana is that it will also deliver digital objects besides metadata. Clearly,

    heterogeneity and interoperability are main issues that such a

    DL is having to deal with, as well as, of course with scalability, quality of

    service and, more in general, sustainability of the joint portal.

    DRIVER16is another notable example of a DL that relies on content providedby a large number of external data providers. It is the result of two subsequent

    projects funded by the European Commission in the period 2006-2009. The

    main aim of these two projects is to create the organisational and technological

    conditions for the set up of a European Repository Infrastructure (Jones, S.,

    Manghi, P., 2009). The main instrument identified by the project to address

    organisational issues is the DRIVER Confederation17. The Confederationpartners represent European and international repository communities, like

    subject based communities, repository system providers, service providers, as

    well as political, research, and funding organisations, who share the DRIVER

    vision to allow all research institutions in Europe and worldwide to make all

    their research publications openly accessible through institutional repositories.

  • 7/31/2019 Term Paper - Devansh Mathur



    In the spirit of this shared goal, the DRIVER confederation encourages a

    combined effort of repository development by setting up guidelines and best

    practices that favour the realization of a shared, trusted, long-term repository

    infrastructure. From the technical point of view, DRIVER is based on theD-Net

    technology18. This enabling technology is quite innovative in the context of

    these kinds of aggregative systems because it is oriented to the realisation of a

    digital library infrastructure (cf. Sec. 5). D-Net is based on a Service-oriented

    architecture, where distributed and shared resources are implemented as

    standard Web Services and applications consist of sets of interacting services. It

    offers services to both data providers, that through it can more easily

    share their content, and service providers, that are facilitated in implementing

    DLs that exploit the aggregated content.19 At the time of this writing, the

    DRIVER service provides access to approximately one million records out of

    200+ repositories across 27 countries. Moreover, it delivers three DL

    applications: the Belgium national repository portal, offering search over the

    Belgium Repository Federation subset; Recolecta national repository portal,offering search on the Spanish Repository Federation subset; and the main

    DRIVER portal, providing access and advanced functionality over the whole

    space. The current Europeana and DRIVER services operate an information

    space of metadata records, i.e. they harvest metadata records through the

    OAI-PMH protocol from exiting repositories and then they run their services by

    exploiting this content. Because of this they suffer from the limitations thatOAI-PMH poses if it has to be used to exchange information objects that are

    rich in structure and payload as those at the core of changing nature of

    scholarship and scholarly communication.(Van de Sompel, H., Payette, S.,

    2004)(Van de Sompel, H., Lagoze, C., 2006) In particular, when feasible, they

    give access to the content associated with the metadata by exploiting URL or

  • 7/31/2019 Term Paper - Devansh Mathur



    some other information contained in the record. This solution to access

    information objects, however, suffers of two main problems:

    (i)the access is not always feasible since there is no standard protocol toaccess objects;

    (ii) There is no way of accessing compound objects since the structure and

    the relation holding among the different parts is unknown.

    A solution to this problem may come from the OAI-ORE20 standard, whose

    version 1.0 has been released in October 2008 by the Open Archives Initiative.

    This standard, based on Web standards, proposes a solution to handle

    aggregations of Web resources. These aggregations, sometimes called

    compound digital objects, may combine distributed resources having multiple

    media types including text, images, data, and video as to form innovative

    research outcomes. Both Europeana and DRIVER have already planned to

    move very soon to technologies la OAI-ORE to manage compound objects.

    All the systems and initiatives described in this section are essentially oriented

    to contentsharing. Moreover, the majority of them is characterisedby a strong organisational effort since the model is based on a cooperative

    participation of the content providers. Content sharing across digital libraries is

    now being largely promoted as an important strategy to reduce the digital

    library set up costs largely coming from selecting, digitising, describing, and

    digitally curating content resources. However, the realisation of wide and

    generalised content sharing is today still problematic due to the great variety ofproprietary models and ontologies adopted by existing systems and by the lack

    of systematic approach to interoperability a recently funded EC project

    stemming from the DELOS project, is paving the way for the future

    interoperability of DL systems thus making feasible the implementation of

    global digital library infrastructures.

  • 7/31/2019 Term Paper - Devansh Mathur



    The Digital Library Universe

    A Three-tier FrameworkA Digital Library is an evolving organisation that comes into existence through

    a series of development steps that bring together all the necessary constituents.

    Figure below presents this process and indicates three distinct notion of the

    `systems` developed along the way forming a three tier framework: Digital

    Library, Digital library system and Digital library management system. These

    correspond to three different levels of conceptualisation of the universe of

    Digital Libraries.

    Figure 2 DL, DLS and DLMS: A Three-tier Framework

    These three system notions are often confused and are used interchangeably inthe literature this terminological imprecision has produced a plethora of

    heterogeneous entities and contribute to make the description, understanding

    and development of digital library systems difficult.

  • 7/31/2019 Term Paper - Devansh Mathur



    As Figure indicates, all three systems play a central and distinct role in the

    digital library development process. To clarify their differences and their

    individual characteristics, the explicit definitions that follow may help.

    Digital Library (DL)

    A potentiallyvirtual organisation,thatcomprehensivelycollects,managesand

    preserves for the long depth of time rich digital content, and offers to its

    targeted user communitiesspecialised functionality onthatcontent,of Defined

    quality and according to comprehensive codified policies.

    Digital Library System (DLS)

    A deployed software system that is based on a possibly distributed architecture

    and provides all facilities required by a particular Digital Library. Users interact

    with a Digital Library through the corresponding Digital Library System.

    Digital Library Management System (DLMS)A generic software system that provides the appropriate software infrastructure

    both (i) to produce and administer a Digital Library System incorporating The

    Suite of facilities considered fundamental for Digital Libraries and (ii) to

    integrate additional software offer in more refined specialised or advanced

    facilities. Although the concept of Digital Library is intended to capture an

    abstract system that consists of both physical and virtual components the DigitalLibrary System and the Digital Library Management System capture concrete

    software systems. For every Digital library there is a unique digital library

    system in operation (possibly consisting of man inter connected smaller digital

    library systems) whereas all Digital Library systems are based on a handful of

    Digital library management systems. The digital library is thus the abstract

  • 7/31/2019 Term Paper - Devansh Mathur



    entity that `lives` thanks to the software system constituting the DLS and the

    DLMS is the software system that is conceived to support the life cycle of one

    or more DLS. It is important to note that all these concepts underlie all types of

    information environment and systems, e.g. database, hospital info systems,

    Banking systems, the web, Wikipedia,, etc. however it is the particular

    characterizations given in the definitions of the previous section that

    distinguishes digital libraries from the others: the content should be rich,

    annotated and preserved for depth of time the user communities should be

    targeted, the functionality should be specialised the quality should be measured

    and according to the comprehensive policies . All of these characterizations, of

    course are abstract and subject to interpretation, so they cannot lead to a precise

    formal definition. Nevertheless they offer conceptual yardsticks by which

    system can be measured and mutually compared and psychological lower

    bounds can be established regarding the nature of digital libraries.

    Main ConceptsDespite the great variety and diversity of existing digital libraries, there is a

    small number of fundamental concepts that underlie all systems. These concepts

    are identifiable in nearly every digital library currently in use. They serve as a

    starting point for any researcher who wants to study and understand the field ,

    for any system developer intending to construct a digital library, and for any

    content provider seeking to expose its content via digital library technologies. In

    this section, we identify these concepts and briefly discuss them. Seven core

    concepts provide a foundation for digital libraries. One of them appears in the

    definition of digital library to capture the commonalities between this universe

    and other social arrangements: Organisation.

  • 7/31/2019 Term Paper - Devansh Mathur



    Five of them appear in the definition of digital library to capture the features

    characterising this kind of organisation and the expected service: Content,

    User, Functionality, Quality and Policy. The seventh one emerges in the

    definition of the digital library system to capture the systemic features

    underlying the expected service:Architecture.

    All seven concepts influence the digital library three-tier framework, as shown

    in fig.

    Figure 3 the Digital Library Universe: Main Concepts


    The organisation concept is surrounding the entire Digital library universe. A

    digital library is a kind of organisation by its own, it is a social arrangement

    pursuing a well-defined goal (the digital library service). This concept subsumes

    the mission the digital library has been conceived for and every other aspect that

    is needed to define this mission and the operation of the resulting service.

    However, this should not be confused with the organisation/institution that

  • 7/31/2019 Term Paper - Devansh Mathur



    decided to set up the digital library and drive its development although there are

    overlaps and dependencies between the two. It is quite easy to recognise the

    dependency relationship between the two, to some extent the institution sets the

    scene for the digital library organisation, the institution is the establisher of the

    Digital LibraryOrganisation and has the power to define the overall service this

    organisation is requested to realise. However, the digital library, being an

    organisation by its own has the power to control its own behaviour and

    evolution in the frame defined by the institution. This concept is fundamental to

    characterise the digital library universe because it highlights the commonalities

    between this universe and the other one dedicated to capture organised body of

    people having a particular purpose.


    The content concept encompasses the data and the information that the Digital

    library handles and makes available to its users. It is composed of a set of

    information objects organised in collections. Content is an umbrella conceptused to aggregate all forms of information objects that a digital library collects,

    manages and delivers. It encompasses a diverse range of information objects,

    including primary objects annotations and metadata.

    This concept is fundamental to characterise the digital library universe because

    it captures one of the major resource these organisations are called to manage,

    i.e. the data and the information that is made available through it.


    The User concept covers the various actors (whether human or machine)

    entitled to interact with Digital Libraries. Digital Libraries connect actors with

    information and support them in their ability to consume and make creative use

  • 7/31/2019 Term Paper - Devansh Mathur



    of it to generate new information. User is an umbrella concept including all

    notions related to the representation and management of actors entities within a

    digital library. It encompasses such elements as the rights that actors have

    within the system and the profiles of the actors with characteristics that

    personalise the systems behaviour or represent these actors in collaborations.

    This concept is fundamental to characterise the Digital Library universe because

    it captures the actors of the overall Organisation.


    The Functionality concept encapsulates the services that a Digital Library offers

    to its different users, whether individual users or user groups. While the general

    expectation is that Digital Libraries will be rich in functionality, the bare

    minimum of functions includes new information object registration, search and

    browse. Beyond that, the system seeks to manage the functions of the Digital

    Library to ensure that the overall service reflects the particular needs of the

    Digital Librarys community of users and/or the specific requirements related toIts Content. This concept is fundamental to characterise the digital library

    universe because it captures the facilities offered by the overall organisation.


    The Policy concept represents the set or sets of conditions, rules, terms, and

    governing every single aspect of the digital library service including acceptablebehaviour, digital rights management, privacy and confidentiality charges to

    users, and collection formation. Policies may be defined within the digital

    library or be superimposed by the institution establishing the digital library or

    outside of that (e.g., Policy governing our society). There policies can be

    extrinsic or intrinsic policies.

  • 7/31/2019 Term Paper - Devansh Mathur



    This concept is fundamental to characterise the Digital Library universe because

    it captures the rules and conditions regulating the overall Organisation.


    The Quality concept represents the parameters that can be used to characterise

    and evaluate the overall service of a Digital Library including every aspect of it,

    i.e. Content, User, Functionality, Policy, Quality, and Architecture.

    Quality can be associated not only with each class of content or functionality

    but also with specific information objects or services. Some of these parameters

    are quantitative and objective in nature and can be measured automatically,

    whereas others are qualitative and subjective in nature and can only be

    measured through user evaluations (e.g., focus groups). This concept is

    fundamental to characterise the Digital Library universe because it captures

    qualitative aspects characterising the Organisation.


    The Architecture concept refers to a Digital Library System and represents a

    mapping of the overall service offered by a Digital Library (and characterised

    by Content, User, Functionality, Policy and Quality) on to hardware and

    software components. There are two primary reasons for having Architecture asa core concept:

    (i)Digital Libraries are often assumed to be among the most complex andadvanced forms of information systems (Fox & Marchionini, 1998);


  • 7/31/2019 Term Paper - Devansh Mathur



    (ii)Interoperability across Digital Libraries is recognising as a majorchallenge.

    A clear architectural framework for Digital Library Systems offers ammunition

    in addressing both of these issues effectively. This concept is fundamental to

    characterise the Digital Library universe because it captures the systemic part of

    the service offered by the Organisation. The concepts populating the areas just

    introduced (Organisation Is a special case since it subsumes all the rest) share

    many similar characteristics and all refer to internal entities of a Digital Library

    that can be sensed by the external world. Therefore, there has also been

    introduced a higher--level concept referring to all of these, i.e.,

    Resource, which enables us to reason about the common characteristics in a

    consistent manner.

    Figure puts in perspective the main concepts of the Digital Library universe.

    The Organisation concept surrounds and subsumes all the other concepts.

    Among the remaining six, two of them are independent each other, i.e., they

    exist independently of a specific Digital Library. These are User, Representing

    the external humans or hardware interacting with the Digital Library and

    Content, representing the material handled by the Digital Library. Architecture,

    representing the technological design on which the Digital Library System is

    based, represents the underlying technology that is called to implement all the

    rest. On top of these concepts there comes Functionality, primarily representing

    the means for connecting Userto Content, i.e., all procedures, transformations,actions and interactions that bring Content to User Or vice versa. Finally,

    operation of the Digital Library and activation of its Functionality are based on

    Policy and aim to achieve certain Quality.

  • 7/31/2019 Term Paper - Devansh Mathur


  • 7/31/2019 Term Paper - Devansh Mathur



    The Main Roles of Actors

    In order to describe the overall operation of the Digital Library Organisation

    and the way it is expected to deliver the service it has been established for, we

    envisage actors interacting with digital library playing roles in three different

    and complementary categories:

    DL End--users,

    DL Managers


    Figure 5 The Main Roles of Actors versus the Three--tier Framework

    As shown in Fig each role is primarily associated with one of the three

    systems in the three--tier framework. The system a role is associated with

    represents the entity that is expected to provide the actor playing such a role

    with the facilities needed to accomplish the mandate assigned to the role.

    Moreover, every actor, independently from the role he/she is playing, is

    expected to deal with all the foundational concepts characterising the Digital

    Library universe.

  • 7/31/2019 Term Paper - Devansh Mathur



    DLEnd-- user

    DL End--users exploit the overall Digital Library service for the purpose of

    providing, consuming, and managing the DL. They are the target clients of the

    service defined by the DL Organisation in terms of the Content to be managed,

    the User(s) to be served, the Functionality to be supported, the Policy (ies) to be

    put in place and the Quality to be exposed. They perceive the DL as a stateful

    entity serving their needs. This state of the Digital Library is a complex

    condition resulting from and influencing Content, User, Functionality, Policy

    and Quality aspects of the DL Organisation. Moreover, the state is expected to

    evolve during the lifetime of the Digital Library as a consequence of a series of

    actions and activities performed in the context of the DL Organisation as well as

    of external factors influencing the DL Organisation. DL End--users may be

    further divided into

    Content Creators, Content Consumers and Digital Librarians.

    Content Creators are theproducersoftheDigitalLibraryContent,i.e.,they

    take careof producing new items contributing to the Digital Library Content.


    (i) ThroughtheFunctionalitytheDLisprovidedwith,

    (ii) In accordance with the Policies defined in the DL, and(iii)

    With the guarantee of Quality the DL declares.

    Content Consumers are theclientsof theDigitalLibraryContent, i.e., they

    access and use the items in the Digital Library Content. Their activity is


    (i) Through the Functionality the DL is provided with,

  • 7/31/2019 Term Paper - Devansh Mathur



    (ii) In accordance with the Policies defined in the DL, and(iii) With the guarantee of Quality the DL declares.

    Digital Librarians are the curators of the Digital Library Content, i.e. they

    select,organise and look after the items in the Digital Library Content. Their


    (i) Through the Functionality the DL is provided with,(ii) In accordance with the Policies Defined in the DL, and(iii) With the guarantee of Quality the DL declares. Moreover, they might

    influence the behaviour of the overall Digital Library service by acting

    as mediators between the final clients of iti.e., Content Creators and

    Content Consumers and those defining and operating this service

    i.e., DL Managers by distilling and elaborating feedbacks on the DL.

    DL ManagersDL Managers are the actors driving the overall Digital Library service. They are

    expected to rely on the facilities offered by The DLMS To define and operate

    the Digital Library and the DLS implementing it. DL Managers may be further

    divided into DLDesigners and DL SystemAdministrators. The former are

    called to devise the overall service while the latter are called to deploy and

    operate the DLS implementing the planned service.


    exploit their knowledge of the application environment that a DL is called to

    serve in order to define, customise, and maintain the Digital Library so that it is

    aligned with the needs of its target DL End--users.

  • 7/31/2019 Term Paper - Devansh Mathur



    To perform this task, the DL Designers interact with the DLMS to decide upon

    the characteristics the Digital Library Should have in terms of

    (i) Content, e.g., the set of repositories, ontologies, classification schemas,information object types, metadata formats, authority files, and

    gazetteers that form the DL Content;

    (ii) User, e.g., the allowed actors, the allowed roles, the informationcharacterising the actors;

    (iii) Functionality, e.g., the functional facilities to be offered, the behaviourthese facilities should implement;

    (iv) Policy e.g., the rule and principles governing the evolution of the DLContent, the allowed actions per actor or family of actors, the

    exploitation of a resource;

    (v) Quality, e.g. the minimal availability of DL Functionality, the minimalresponse time of DL Functionality, the completeness and

    authoritativeness of the DL Content, the confidentiality of the Useractions. These aspects characterise the overall Digital Library service,

    actually the way it is perceived by the DL End--users.

    These parameters need not necessarily be fixed for the entire lifetime of the DL;

    they may be reconfigured to enable the DL to respond to the evolving

    expectations of target users and changes in all aspects.

    DL System Administrators

    DL System administratorworkintandemwithDLDesignerst oputinplacethe


  • 7/31/2019 Term Paper - Devansh Mathur



    They select, deploy and manage a set of networked computers and software

    modules needed to fulfil the expectations that DL End--users and DL

    Designers have for the Digital Library.

    DL System Administrators perform their tasks by interacting with the DLMS

    and relying on the facilities these systems offer for DLS constituents

    identification, linking, allocation, deployment, configuration, tuning,

    monitoring, alerting, and any other management facility requested to manage

    potentially distributed software systems as DLSs are expected to be. Different

    DLMSs are expected to offer diverse management facilities ranging from

    manual installation and configuration of the computers and the software

    modules on the target computers to fully autonomic solutions aiming at

    reducing human intervention to a few corner cases.

    DL Software Developers

    DL Software Developers develop and/or customise the software components

    that will be used as constituents of the DLSs. They are requested to produce the

    software implementing every aspect of the Digital Library service ranging from

    the DL Content and User to Functionality, Policy and Quality. However, DL

    Software Developers should not start from scratch and their activity is expected

    to be performed by relying on the offering of a DLMS. In fact, a DLMS is asoftware system that is equipped with a bunch of off--the--shelf software

    modules implementing to some extent some Digital Library facilities, e.g.,

    content repositories, users management systems, cooperative working

    environments, information retrieval engines, policies enforcement modules.

  • 7/31/2019 Term Paper - Devansh Mathur



    DL Software Developers include Software Engineers and Programmers that

    are requested to customise and complement the set of software modules

    provided by the exploited DLMS as to obtain the set of software constituents

    needed to implement the planned Digital Library. The three roles described

    above encompass the entire spectrum of actors working in the digital libraries

    universe. Their conceptual models of such a universe are linked together in a

    hierarchical way, as shown in Figure I.4--2.

    This hierarchy is a direct consequence of the above definitions, since DL End--

    users act on the Digital Library, whereas DL Managers and DL Application

    Developers operate on the DLS (through the mediation of a DLMS) and,

    consequently, on the DL as well. This inclusion relationship ensures that

    cooperating actors share a common vocabulary and knowledge. For instance,

    the DL End--user expresses requirements in terms of the DL model and,

    subsequently, the DL Designer understands these requirements and defines the

    DL accordingly.

    Figure 6 Hierarchies of Users' Views

  • 7/31/2019 Term Paper - Devansh Mathur



    Advantages and Disadvantages of Libraries: From Past

    Till Now


    Computers have made revolutionary changes in every field of life; undoubtedly,

    the field of education and information has been no different. Importantly,

    conventional libraries moved to the concept of digital libraries, which ultimately

    made gaining knowledge more efficient and organised. However, a notable

    important fact here is that the digital library should stand for more than a well-

    organised centralised form of information [26]. Furthermore, they should also

    embody the essence of communication, which was originally the aspect of face-

    to-face interaction between the people at a conventional library.


    People can access required information at any time of the day, as long as they

    have access to the internet.


    Searching is not efficient, as it may not provide meaningful data to the user as

    a result of his command. In many cases, access to certain information is limited

    by copyright law.

  • 7/31/2019 Term Paper - Devansh Mathur



    Data is static; therefore, no users can contribute their views or share their

    knowledge with other participants.

    Digital libraries should explore using Semantic Web technologies to meet their

    organizational challenge. It summarizes the key challenges facing digital

    libraries, based on a digital libraries workshop. They identify five key

    challenges facing digital libraries: interoperability; description of objects and

    repositories; collection management and organization; user interfaces and

    human-computer interaction; and economic, social, and legal issues. Digital

    libraries can use Semantic Web technology to help meet all of these challenges.


    Following the advent of digital libraries in our lives, another innovative step

    followed. This step was made in relation to making the search more meaningful

    and direct. Essentially, it was concerned with refraining from the habit of

    searching all the things everywhere.

    The growth of Web 2.0 has given way to new methods of accessing

    information and contributing opinions. Notably, semantic digital libraries enable

    the user to get the intended information concerning an object without thepresence of the exact word in the search. This integrated form of information is

    based on different metadata which provides a more meaningful data. These

    libraries tend to provide a better and more convenient form of browsing


  • 7/31/2019 Term Paper - Devansh Mathur



    Figure 7 Evolution of Semantic Digital Libraries


    Semantic Digital Libraries make it easier to find information in the vast ocean

    of available data. This is facilitated by ontology-based search and facet search.

    Access is not confined to only one digital library; to the contrary, it provides a

    mechanism of interoperability between different systems.

  • 7/31/2019 Term Paper - Devansh Mathur




    Existing metadata of the digital libraries have to be lifted to a semantic level.

    Not all digital libraries, government agencies etc. maintain metadata.


    Semantic digital libraries tend to focus more keenly on the retrieval of

    meaningful information rather than giving the opportunity of sharing user

    knowledge. This need subsequently led to the development of social semantic

    digital libraries.

    Figure 8 Evolution of Social Semantic Information

  • 7/31/2019 Term Paper - Devansh Mathur



    This is achieved by a combination of Semantic Web with collaboration tools on

    the web. Social semantic digital libraries complement the existing features of

    the semantic digital libraries by providing the opportunity to contribute to the

    information. Web 1.0 evolved into a collaborative platform where people could

    interact and share information, i.e. Web 2.0. Web 2.0 was promoted by Tim

    OReily around the year 2005; it gives ordinary internet users the opportunity to

    interact, meet and share information like never before, and involves concepts

    like blogs, wikis, social networking sites etc.


    libraries, and accordingly achieve great things.


    individuals to the other.

    Disadvantagesamateurish data by some users.


    Social Semantic Digital Libraries (and Web 2.0) have made the web

    collaborative and interactive; however, one drawback which has become

    apparent as a result of this innovation is that of information overload. Owing to

    the increase in internet users and thus their participation level on forums, it has

    become difficult to point to the knowledge part of the content.

  • 7/31/2019 Term Paper - Devansh Mathur




    Another drawback of such libraries is that web pages are dynamic but are not

    very structured. Notably, Web 2.0 tools enable the shaping of content on the

    pages, but not the content itself. Essentially, the future will address these

    aspects and make the web more powerful and structured by the advent of Web

    3.0 . The basic concept behind Web 3.0 is that of ontologydefined by

    Thomas Gruber as explicit specification of a conceptualisation. Another future

    enhancement which is foreseen for the future is that there shall be digital

    annotation linked with physical objects in life: for example, in a museum. An

    application of this technology can be to have real-virtual tours of a certain

    place: for example, to start with a real guided tour and then (if desired)

    browsing through the virtual context information or otherwise gathering

    information about other exhibitions in the premises. The future aspect of the

    social semantic digital library is to improve user benefits by empowering the

    user interfaces and social networking. The user identification and system

    automation are important key points in the future social semantic digital


  • 7/31/2019 Term Paper - Devansh Mathur



    Future Prospects

    There are inevitable barriers to the Semantic Web that still need to be addressed.

    We have mentioned the slow progress on certain features, particularly ontology

    and reasoning support, due to the development community not coming to a

    consensus. This does not mean that progress cannot be made immediately using

    the simpler tools for RDF and RDF Schema available now.

    Some of the larger IT companies are hanging back, waiting to spot the

    opportunity and waiting for the research community to settle on standards. Thus

    the main impetus is coming from communities themselves it is an opportunity

    to profoundly affect the way that the world talks to each other.

    There is a good deal of RDF data giving semantic descriptions already on the

    Web, both from website owners publishing their own annotations as RDF files

    and from sites such as rdfdata.org which provide portals for RDF data.

    However, before the Semantic Web can become globally usable, there does

    need to be more, and it needs to be more easily available. There is a distinct

    overhead to using the Semantic Web in terms of establishing shared

    vocabularies and ontologies, and in providing the appropriate annotations to

    resources which make them visible to the Semantic Web. This is a non-trivial

    task and often users will either not have the time to include this, or the expertise

    to do it well. A missing component of the Semantic Web is a simple means to

    support this, similar to the editors and tools for the conventional Web.

    Undoubtedly the simplicity of the HTML language used within the current Web

    was a major influence on its success and in order for the Semantic Web to break

    out from narrow communities to universal use it needs to address the issues of

    making it easy to use and accessible to all.

  • 7/31/2019 Term Paper - Devansh Mathur



    Otherwise, the Semantic Web is likely to require particular effort and expertise.

    This is expensive, and so it may well be confined to particular domains on the

    Web which see a strong advantage in its use, although over time as the expertise

    becomes more commonplace it should become cheaper. Also, the 'network

    effect' can work as both a barrier and an incentive. One of the main advantages

    of supplying Semantic Web annotation is that is can be shared and can gain

    advantage to others, so when there is little data to share, then there is little

    incentive to take the extra expense in sharing; however, once the ball starts to

    roll, there is an exponential advantage in combining your own data with others'.

    These problems may be less of a disadvantage in the HE and FE sectors, which

    has well-integrated communities with stronger control over their resources.

    Information science professionals in libraries are available to help with the task

    of cataloguing and publishing annotations. Thus it is likely that this sector will

    be in the forefront in the use of this technology.

    The impact on digital libraries, combined with the Open Access Initiative andthe rise of open archiving is likely to be quite profound. Libraries become

    'value-added' information annotators and collators rather than the archivists of

    externally published literature and the holders of the published output of

    institutions. The Semantic Web, although not a prerequisite or a motivator for

    this change is nevertheless likely to smooth its development. The tools are in

    place for sharing classification schemes and to allow the community to develop,deepen and share such schemes. The information infrastructure tools discussed

    above will have particular impact on the way students and researchers find

    information, so these tools may typically be provided and adapted by libraries

    who will tailor them to the needs of their own users. The Semantic Web, like

  • 7/31/2019 Term Paper - Devansh Mathur



    the current Web, has the capacity of being an overwhelming place; libraries are

    well-placed to make sense of this for the HE and FE community.

    Existing Semantic Digital Library Systems

    SIMILE (Semantic Interoperability of Metadata and Information in

    unLike Environments): This system focuses on enhancing the integration

    aspect of metadata, services etc. to increase accessibility.

    We propose a project that will create compelling use case demonstrations and

    prototypes at the intersection of community or individual information creation

    and management, institutional information and digital collections management,

    and the Semantic Web data architecture.

    We seek to enable low-cost, scalable interoperability among digital research

    collections, the metadata that describes them, and the services that leverage that

    metadata and digital material. We will extend DSpace to greatly enhance its

    ability to provide easy-to-use support for arbitrary schemas and metadata,

    primarily through the use of RDF and Semantic Web techniques, and explore

    support for distributed research collections. The project will ground its work in

    focused, well-defined, real-world use cases in the domains of libraries and

    higher education. We will also collaborate with other significant projects that

    operate in these domains, (e.g. ARTstor, JSTOR, OCW, Fedora, Sakai, and

    VUE) to improve understanding and use of Semantic Web technologies to

    provide scalable data interoperability within and across those platforms.

    Jerome Digital library: Can be considered as a social semantic digital

    library. Based on Semantic Web as well as social networking in order to

  • 7/31/2019 Term Paper - Devansh Mathur



    promote collaborative activities along with other common uses of semantic

    digital libraries.With JeromeDL social and semantic services every library usercan bookmark interesting books, articles or other materials in semantically

    annotated directories.

    Users can share their knowledge with others within a social network. We

    enriched the standard SSCF browser with an ability to bookmark and browse

    community based data. JeromeDL also has a feature which allows it to treat a

    single library resource as a blog post. With SIOC based annotations users can to

    comment the content of the resource and in this way create new knowledge.

    JeromeDL also provides various browsing, filtering and navigation solutions,

    such as TagsTreeMaps, MultiBeeBrowse and Exhibit.

    JeromeDL has been installed in a number of locations; the two most used,

    DERI Galway library and WBSS8 at Gdansk University of Technology, serve

    their community of users in everyday activities. DERI Galway library is used by

    researchers as a pre-print server to locate and share publications. WBSS

    maintains a set of scans of antique books and a number of books written bylecturers at GUT; the latter ones are used as learning material.

    BRICKS: This system focuses on the basic infrastructure of a digital library

    network so that information can be shared amongst users in the cultural heritage

    domain. The BRICKS network infrastructure uses the Internet as a backbone,

    and is made of decentralized BRICKS Nodes (BNode), in order to avoid centralpoints whose failure or overload could stop or slowdown the whole Network.

    BNodes communicate among each other and use available resources for content

    and metadata management.

    Every BNode knows directly only a subset of other BNodes in the system.

    However, if a BNode wants to reach another member that is directly unknown

  • 7/31/2019 Term Paper - Devansh Mathur



    to it, it will forward a request to some of its known neighbour BNodes that will

    deliver the request to the final destination or forward it again. BRICKS users

    access the system only through a local BNode available at their institution.

    Hence every user request is primarily sent to the institution's BNode and then

    the request is routed via other BNodes to the final destination.

    Search requests behave like that; the BNode pre-selects a list of BNodes where

    a search request can be fulfilled, and then the BNode routes it there. When the

    location of the content is known, e.g. as a result of the query, the BNode is

    directly contacted.


    Traditional libraries have taken the shape of an interactive, accessible and

    efficient platform which is present for the user at any time of the day. The new

    forms of digital libraries, i.e. semantic digital libraries, have proved to produce

    more meaningful results for the user. Further developments in semantic digital

    libraries have evolved the concept of contribution of information and social

    interactivity between the contributors. Therefore, the future holds much more

    promising and efficient mechanisms for handling information.

  • 7/31/2019 Term Paper - Devansh Mathur




    [1] Diane, V., 2006, When did the Web start? Developed Traffic,


    [2] Berners-Lee, T., Hendler, J., Lassila, O., 2001, the Semantic Web, Scientific

    American Magazine (May 17, 2001).http://www.sciam.com/article.cfm?id=the-


    [3] Kruk, S., R., Haslhofer, B., Kneevic, P., 2007, Tutorial 7- Semantic Digital

    Libraries, JCDL 2007

    [4] Lagoze, C., Krafft, D., B., Payette, S., Jesuroga, S., 2005, What Is a Digital

    Library Anymore, Anyway? D-Lib Magazine November 2005, Volume 11

    Number 11,http://dlib.org/dlib/november05/lagoze/11lagoze.html

    [5] Borgman, C., L., 2000, Digital libraries and the continuum of scholarly

    communication,Journal of Documentation, 56 (4), pp. 412-430

    [6] Borgman, C., L., 1999, What are digital libraries? Competing visions,

    Information Processing & Management, pp. 227-243,

    [7] Celino, I., Turati, A., Della Valle, E., and Cerizza, D. (2006). Squiggle Med:

    Semantic search for medical digital library, Technical report, CEFRIEL.

    [8] Baruzzo, A., Casoto, P., Challapalli, P., Dattolo, A., Pudota, N., Tasso, C.,

    2009, Toward Semantic Digital Libraries: Exploiting Web2.0 and Semantic

    Services in Cultural Heritage,Journal of Digital Information, Vol. 10, No 6.

    [9] Xu, X., Zhang, F., Ni, Z., 2008, An Ontology-Based Query System For

    Digital Libraries, IEEE Pacific-Asia Workshop on Computational Intelligence

    and Industrial Application

    [10] Kruk, S., R., Decker, S., Haslhofer, B., Kneevic, P., Payette, S., Krafft,

    D., 2007, TutorialSemantic Digital Libraries,BANFF 2007.

    [11] Thomas, S., 2006, Web 2.0 and the future for library systems, Technical

    Report, University of Adelaide,

  • 7/31/2019 Term Paper - Devansh Mathur



    [12] Baruzzo, A., Casoto, P., Dattolo, A., and Tasso, C., 2009, A conceptual

    model for digital libraries evolution. In WEBIST 09: Proceedings of 5th

    Informational Conference on Web Information Systems and Technologies,

    pages 299304, Berlin. Springer-Verlag.

    [13] The Spiritus Temporis Web Ring Community, 2005 Digital library,


    [13]D. McGuinness and F van Harmelen (eds) OWL Web Ontology LanguageOverview


    [14] M. Dean, G. Schreiber (eds), F. van Harmelen, J. Hendler, I. Horrocks, D.

    McGuinness, P. Patel-Schneider, L. Stein, OWL Web Ontology Language

    Reference http://www.w3.org/TR/2003/WD-owl-ref-20030331/