digital libraries: an overview dr. i.r.n. goudar head, icast national aerospace laboratories...
TRANSCRIPT
DIGITAL LIBRARIES: AN OVERVIEW
Dr. I.R.N. GoudarHead, ICAST
National Aerospace LaboratoriesBangalore – 560017
One day Seminar onDigital Library Services for Technical
CollegesBasaveshwar Engineering College
Bagalkot – 587102
15 April 2006
Traditional Libraries
Libraries with the same purpose, functions, and goals Collection development and
management Technical Processing Index creation Counter Transactions Reference work Preservation
What is Digital library ? A Service? An Architecture? A set of Information Resources? A set of tools to locate, search, retrieve
information? Possibly the tools to create such resources
and services also fall within the purview of DLs
Digital face of traditional libraries Include both digital collections and traditional Backbone and nervous system of libraries.
“A digital library service is an assemblage of digital computing, storage, and communications machinery together with the software needed to reproduce, emulate, and extend the services provided by conventional libraries based on paper and other material means of collecting, storing, cataloguing, finding, and disseminating information.” (Gladney H.M, et. al. 1994)
“Digital Libraries are a set of electronic resources and associated technical capabilities for creating, searching,and using information…they are an extension and enhancements of information storage and retrieval systems that manipulate digital data in any medium (text, images, sounds,static or dynamic images) and exist in distributed networks” (Borgman, 1996)
Defining the Digital Library
What is Digital library ? Borgman identifies two major aspects
DL researchers from Computer Science focus on content for user communities and therefore emphasize the enabling technologies
Library professionals appear to emphasize DLs
as services
However require both the skills of librarians as well as those of computer scientists
What is important?
Site Neutrality Access-Anytime (24*7) Anywhere (Office, Residence, Travel) By Anyone Open Access and Sharing of information Greater variety and granularity of information Up-to-date ness New forms of rendering ( New Genre) Integration of digital media into traditional collections Digital libraries are different in that they are designed to
support the creation, maintenance, management, access to, and preservation of digital content
The digital library is not a single entity; The digital library requires technology to link the
resources of many; The linkages between the many digital libraries and
information services are transparent to the end users; Universal access to digital libraries and information
services is a goal; Digital library collections are not limited to document
surrogates: they extend to digital artefacts that cannot be represented or distributed in printed formats.
Association of Research Libraries (1995)
Five Elements in Various Definitions of DL
Goals of DL
Focused on digitization technology, metadata schemes, data management techniques, and digital preservation.
Second-generation digital library exploring new opportunities and developing
new competencies. Third-generation digital library
focusing instead on fully integrating digital material into the library’s collections through a modular systems architecture.
Digital Libraries Shorten the Chain
Editor Publisher A&I
ConsolidatorLibrary
Reviewer
AUTHOR
USER
DIGITALLIBRARY
READER AUTHOR LIBRARIAN EDITOR LEARNER TEACHER
ROLES
Ingredients for DLs
Hardware The minimum machinery to do the job
Software The programs for handling data
Digital Objects Articles, Conference Papers, Thesis,…… Basic Skills
Things one has to learn
Hardware
A Server You’ll need access to a web server
A good PC Scanners Flatbed – Auto feed, Back to back
MF
Book Scanner
Software
Open Source Software (OSS) Dspace, E-Prints, Fedora, GSDL……
Proprietary software you can’t avoid Image Editing and Optical Character
Recognition Software have to be purchased
Hardware- Software Network High-speed local networks and fast
connections to the Internet Relational databases that support a
variety of digital formats Full text search engines to index and
provide access to resources Web servers and FTP servers (both
intranet and internet) Electronic document management
functions
Digital Library Content
Content Types
TextDocuments
Video
Audio
GeographicInformation
Software,Programs
Images and
Graphics
BioInformation
ArticlesReportsBooks
ManuscriptsNews Papers
ThesesTech. Reports
SpeechMusic
Movies
(Aerial)Photos
GenomeHumanAnimal Plant
ModelsSimulations
Photographs
Paintings
2D
3D
Content is King
The information content is more important than the systems used for its storage, management and retrieval
Objects should not be “locked” in specific DLs or archives
Types of Digital Collections
Digitization Converting paper and other media in
existing collections to digital form Acquisition of original digital works
Created by publishers and scholars like electronic books, journals, and datasets
Access to external materials Like Web sites, other library collections, or
publishers' servers
Resources
Bibliographic databases that point to both paper and digital materials
Indexes and finding tools Collections of pointers to Internet
resources Directories Teaching and pedagogic materials Photographs Numerical data sets E-books and e-journals
Creating DLs …
Six steps Selecting Acquiring Digitization Organizing Archiving Providing Access
PublishingPublishing
Selection ofSelection ofBooksBooks
Scanning Scanning
Identification ofIdentification ofBooksBooks
Meta DataMeta Data
Scanning Scanning ProcessProcess
Image Image Processing & QCProcessing & QC
OCROCR
Process
Digitization
“Conversion of any fixed or analogue media--such as books, journal articles, photos, paintings, microforms--into electronic form through scanning, sampling, or in fact even re-keying.”
Digitization Process ….
Determine copyright or restrictions Digital conversion
Outsource or in house? Text conversion, formats, headers,
compression, and delivery media Digital capture with camera or
scanner ? File handling File naming
Digitization Process Preparing the objects Scanning Moving files to temporary storage Value addition- metadata
preparation etc Long term storage Derivative image/thumb nail for
access copy Merging files
Data Management
Digital Production Process
Supplier Management
Workflow Management
Content Management
Project Management
Quality Management
Data Management
DataManagement
Workflow Management
Content Management
Project Management
Quality Management
Supplier Management
• Formats: TEX, PDF,PS
• Metadata and content data
• Structuring (Tagging)
• Media neutrality
Workflow Management
DataManagement
Workflow Management
Content Management
Project Management
Quality Management
Supplier Management
• Processing
• Conversion
• Automatization
• Interfaces - input / output
Content Management
DataManagement
Workflow Management
Content Management
Project Management
Quality Management
Supplier Management
• Style files
• Information /Object models
• Archiving
Quality Management
DataManagement
Workflow Management
Content Management
Project Management
Quality Management
Supplier Management
• Data consistency
• Process consistency
• Content consistency
ProcessingInput Output
Various Workflows
RTFRTF
TeXTeX
Camera readyCamera ready
BooksBooks
Archive JournalsArchive Journals
SoftwareSoftware
Normali-zation
ContentProcessing
OCR: Optical Character Recognition
On the market are many good OCR programs, with prices ranging from Rs. 5000 to Rs.20,000.
For example, among many others are:Read-Iris (http://www.readiris.com/)Omnipage (http://www.omnipage.com/)Fine-Reader (http://www.finereader.com/)
Possible Delivery Formats Pure image formats: TIFF, JPEG Open encoded formats: XML,
HTML, ASCII, and Unicode Hybrid formats: PDF, DjVu – can
contain both image and text Proprietary formats: Microsoft
Word, WordPerfect
Good Principles
What to digitize? Selection and policy is important
Collection description is important such as scope, format, restrictions on
access, ownership etc
Digitization: Issues
Copyright Access copy and archive copy File size Storage media( CD, Hard disc…) File format ( TIFF,JPEG…)
Challenges in Publishing
Preservation of layout
Searchability of content and metadata
Efficient image compression
Easy browsing of books
Accommodating low bandwidth user
Multilingual text support
Multipaging
Digitization .. Factors Collection strengths
digitizing selected portions adding new digital works
Unique collections only copies of something
Priorities of user communities Like demands of a curriculum
Manageable portions of collections what is reasonable for any one institution to collect or digitize
Technical architecture also be factor in selecting who digitizes what
Skills of staff whose staff don't have the necessary skills
Retrospective Conversion
Complete conversion would be impractical or impossible technically, legally, and economically
Digitization of a particular special collection or a portion of one which is highly valued
Highlight a diverse collection High-use materials
Approaches can be objects used alone or in combination depending upon a particular institution's goals
Criteria for Selecting Content
Their potential for long-term use Their intellectual or cultural value Whether they provide greater
access than possible with original materials (e.g., fragile, rare materials)
Whether copyright restrictions or licensing will permit conversion.
Metadata
The data that describes the content and attributes of any particular item
Key to resource discovery and use of any document
Facilitate searching and discovery, as well as administrative and structural metadata to assist in object viewing,management, and preservation.
Elements of Dublin Core Title Creator Subject and
Keywords Description Publisher Contributor Date
Format Resource Identifier Resource Type Source Language Relation Coverage Rights
Management
Barriers
Digital objects are less fixed, easily copied,
and remotely accessible by multiple users simultaneously
Libraries mostly are simply caretakers of information, own the copyright of the material with restrictions
To develop mechanisms for managing copyright, mechanisms that allow them to provide information without violating copyright, called rights management
Rights Management Usage tracking Identifying and authenticating users Providing the copyright status of each
digital object, and the restrictions on its use or the fees associated with it
Handling transactions with users by allowing only so many copies to be accessed, or by charging them for a copy, or by passing the request on to a publisher
Preservation
Keeping digital information available in perpetuity
Real issue is technical obsolescence Like the deterioration of paper in the
paper age Constantly coming up with new
technical solutions
Three Types of Preservation
Preservation of the storage medium
Preservation of access to content Preservation of fixed-media
materials through digital technology
Preservation of the Storage
Medium Tapes, hard drives, and floppy discs have a
very short life span Obsolete anywhere from two to five years
before they are replaced by better technology
Possibility of non-availability of the hardware or software to read them
May have to keep moving digital information from storage medium to storage medium
Preservation of Access
Access to the content of documents, regardless of their format:
- When the formats (e.g., Adobe Acrobat PDF) containing the information become obsolete - Translate data from one format to another for preserving the ability of users to retrieve and display the information content - Data migration is costly Still no standards for data migration Distortion or information loss
Fixed-media through Digital Technology
Replacement for current preservation media such as microforms
No common standards for the use of digital media as a preservation medium
It is unclear whether digital media are going
handle the task of long-term preservation
Digital Libraries Benefits : Individual
Gain access to the holdings of libraries worldwide through automated catalogs. Locate both physical and digitized versions of scholarly articles and books.
Optimize searches, simultaneously search the Internet, commercial databases, and library collections.
Save search results and conduct additional processing to narrow or qualify results.
From search results, click through to access the digitized content or locate additional items of interest.
All of these capabilities are available from the desktop or other Web-enabled device such as a personal digital assistant or cellular telephone.
Digital Libraries Benefits : Classroom Projects
Capability to enhance the classroom experience or conduct learning apart from a physical campus
Digital library is a core component of this VLE Changing the relationships between the
library and other parts of the academic enterprise
Integrate authoring, analysis, and distribution tools that facilitate the reuse and repurposing of digital content
Collections and services can be integrated into the institutional, national, and worldwide fabric of research and teaching
Digital Library Standards Common User Interface:
Data Handling and Interchange:Graphic Formats – JPEG, TIFF, GIF, PNG, Group 4 Fax, CGMStructured Documents – SGML, HTML, XMLMoving Pictures/3-D – MPEG, AVI, GIF89A, QuickTime, Real Video, ViviActive, VRML
Metadata:Resource Description – Dublin Core, WHOIS++ Templates, US-MARC, TEI Headers, Other Open Source and Domain Specific Standards.Resource Identification – URN, PURL, DOI, SICI
Security, Authentication and payment services:Emerging e-Commerce Standards.
Indian DL Initiatives: Contents
Books (out of copyright) Scholarly Journals Theses Institutional E-Prints Manuscripts Data News Papers Metadata Level Portal and Gateway Services
Ministry websites include policy and planning documents, annual reports, budget etc.
Goa, Andhra Pradesh, Karnataka, Maharashtra, Tamil Nadu have made significant headway
Judgments of Supreme Court and High Courts covered
Government, Judicial, Financial, Land Records
Digital Library of India at IISc, Bangalore
• Mission: Free access to human knowledge through Portal
• Objectives: To capture all books in digital format (1 m by 2005) Test bed for improved Scanning Techniques,OCR, Indexing
Books, Journals, Palm Leaves
> 1L books in English, Telugu, Kannada, Tamil, Sanskrit, Urdu
100 Scanners in 16 scanning centres
Plan for 1 m documents by 2005 Science, arts, culture, music, movies, traditional medicine Will be mirrored at Several location in the world Collaboration: Universal Library Project, CMU http://www.dli.ernet.in/
IISc: Other Activities
Vigyan:
Website on Indian S & T
Collaboration with NISSAT/DSIR
Indo-French Cyber University Initially PG in Applied Mathematics
E-prints at IISC by NCSI (http://eprints.iisc.ernet.in/) Online digital repository of IISc research papersResearch papers (preprints, post-prints), book chapters, tech reports,
unpublished findings, conf papers, magazine articles Set up using e-prints.org open source software Part of worldwide institutional e-print archives
Institutional Repositories
Indian Institute of Science National Aerospace Laboratories National Chemical Laboratories National Institute of Oceanography ISI – Mathematics DRTC- LDL Raman Research Institute IIM Kozikode
Scholarly Science Journals
Indian Academy of Sciences (IAS) –11 Journals Indian National Science Academy – 4 journals Indian Medlars Centre (IndMed) – 22 journals
Vidyanidhi: Dept. Of LIS, Univ. of Mysore
Digital Library and E-Scholarship Portal
Indian Theses Database
Indian ETD Collection
Training Program for improving quality
Supported by DSIR, GOI
Part of global ETD initiative
Support by Ford Foundation and Microsoft
http://www.vidyanidhi.org.in/Theses initiatives by others: IITs
Delhi and IIT, Mumbai
Indira Gandhi National Centre for the Arts
Digital Images
Video Recordings
Audio Recordings
Databases
Bibliographies
Multimedia Documentation
Kalakalpa (Journal)
Electronic Books
Papers and Essays
Research Reports
News Letters
Conference Proceedings
Manuscripts in India
In house Articles
Mumbai Asiatic Society
• Rare Books (as back as 1632)• Manuscripts (Sanskrit, Pali, Tibetan, Prakrit, Arabic, Persian,etc.)
• Maps, Coins
• Buddhist Relics
• Book preservation laboratories
Microfilming – Now digitisation
http://education.vsnl.com/asbl/treasure.html
National Library, Calcutta
Manuscripts (Work in Progress) Paper –3000, Palm Leaf-334
Books (<1900, Indian <1920) 6600 Titles, 2.5 M pages, 548 CDsBengali Journal (Prabasi)East India Company RecordsMany Diaries
OrunudoiAssamese
In BengaliJournal
Archives of Indian Labour
V.V. Giri National Labour Institute
Heritage of Indian Working Class
Commissions on Labour
Oral History Collections
Trade Union Collections
Regional Collections
Strike Collections
http://www.indialabourarchives.org/
Objects Identifications: Non-availability, Coordinated efforts Technology and Infrastructure Standards: Meta data Funds Networking of minds Multiple languages Inhibition and Reservations (libraries and heritage materials) IPR
India: DL Issues
DLI in India: Suggestions
Distributed National Network with Global Access Institutional digital repositories Open access science journals Content based catalogue (metadata) Portal giving links to various activities
Intensive Training of Librarians, Archeologists, Curators, etc. Improve Technology and Infrastructure Adopt Suitable Standards Language Tools Modify IPR to suit Open Archive Enhancement Capability of Integrated Lib Auto System to handle DL Features Compilation of Directory of DL Technologies and Vendors
Bibliography on Digital Libraries http://sunsite.berkeley.edu/CurrentCites/bibondemand.cgi?query=digital+library