a strategic view of document and digital object...

38
A strategic view of document and digital object management for the University of the Witwatersrand, Johannesburg Prof Derek W. Keats Deputy Vice Chancellor (Knowledge & Information Management) The University of the Witwatersrand, Johannesburg http://kim.wits.ac.za [email protected]

Upload: others

Post on 30-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  •    

    A strategic view of document and digital object managementfor the University of the Witwatersrand,

    Johannesburg

    Prof Derek W. KeatsDeputy Vice Chancellor

    (Knowledge & Information Management)The University of the Witwatersrand, Johannesburg

    http://[email protected]

  •    

  •    

    Whataredocuments?

    How does 

    the computer

    'see' them?

  •    

    Thestorage

    view

  •    

    Themanipulation

    view

  •    

    Thestructural

    view

  •    

    Theoperational

    view

  •    

    Thestorage

    view

    Theoperational

    view

    Themanipulation

    view

    Thestructural

    view

  •    

    Require softwarethat understandsthe 'document' andknows how to present it.

    The storage view

    The operational view

    The manipulation view

    The structural view

    Time Time Time

  •    

    The futureToday

    Physicaldeterioration

    Digitalobsolescence

    Accidentaldamage

    Loss of metadata

    Survival

    Devices

    File formats

  •    

    A major threat to

    proprietaryfile formatscommon inproprietary

    systemsToday

    Physicaldeterioration

    Digitalobsolescence

    Accidentaldamage

    Loss of metadata

    Survival

    Devices

    File formats

  •    

    Device obsolescence

  •    

    File format obsolescence

    Software supporting the 

    format fails in the marketplace or is 

    bought by a competitor and 

    withdrawn.

  •    

    File format obsolescence

    Software upgrades fail to support legacy files

    The format itself is superseded by 

    another or evolves in complexity

    The format "take up" is low or 

    industry fails to create compatible 

    software

    The format fails, stagnates, or is no longer compatible with the current 

    environment

  •    

    >

    A small subset of commonly used media formats!

    Media

  •    

    If you don't have the software,even a perfectly preserved document is of no use.

  •    

    Digitization

    Documentmanagement

    Borndigital

    Digitalrecovery

    Digital archiving

    Digital preservation

    Ana

    logu

    eD

    igita

    l

    Time

    Dig

    ital

    asse

    ts

    Risk without long term planning 

  •    

    As a componentof how we manageour digital assets

  •    

    Why digital asset management?

    ● We are a knowledge organization● Knowledge workers spend 30-40%

    of their time on document related tasks● This increases significantly when

    other digital assets are taken into consideration● Digital assets are increasing and increasingly

    easy to lose● Digital assets form the basis of much of our

    research

  •    

    Digital archiving and preservation● Institutional papers and documents

    Other digital assets

    ● Historical papers● Library collections● Various history projects● Rockart collections ● Video and audio collections

    ● e.g. Wits TV● Donations of significant collections

    from industry● History of human evolution research● Research output and theses● Research data

  •    

    The curse of the born-analogue

  •    

    Capture

    Create

    Classify

    Share

    Archive

    DestroyProtect

    Retain

    Find &use

    Preserve

    Route

  •    

    Creating semanticand socially connected

    document storesarchives

    repositoriesmuseumsherbaria

    21st Century

  •    

      Chisimba

    Semantic and social 'X'● Fedora commons● Fedora commons

    SWORD API● Chisimba

       Fedora Commons

    SWORD API

    Chisimba API

    XMPPeLearning'Portals'

  •    

    Workflow

    WEWE

  •    

    Workflow

  • WeWe Basics● Rules-driven workflow engine● Rules represented in XML● Sequential event support● Conditional Return support● Written in Perl● Uses PostgreSQL Database● Open Source ● Originally developed for The University of the

    Witwatersrand, Johannesburg● Multiple Management interfaces

  • WeWe Designer● Web-based design tool for designing

    workflows● Supports multiple events with multiple return

    types/states● Drag and drop interface● Written in JQuery● Open Source Interface● Adapt from Design “Template” support

  • WeWe Developer● Developers create Rules Modules● Modules can be written in Perl or any other

    language that can be executed from the Linux commandline

    ● API● Commandline Interface

  • Workflow Process

  •    

    Enterprise document managementAn approach using private cloud

    Folderserver WEWE Chisimba

    Private cloud infrastructure

    Site

    Ingest

    Bor

    ndi

    gita

    l

    Sharedfolder

    Network

    WEWE

    NetworkSite Site

    Site

    Sharedfolder

    WWW

    WEWEWorkflow managed by WEWE layer

  •    

    Hostedservices

    Digitalarchive

    Virtualization

    ChisimbaFedora

    ChisimbaOther

    Private cloud infrastructure

    Witsportals eLearning

    OS: Open Solaris

    SOA layer

    email

    Zimbra

         iRODS

    Remotesite

    Remotesite

    Remotesite

    Remotesite

     WEWE

    Compute cloud

    Hierarchical storageRobotictape library Spinning disks

    Flash memory

  •    

    Computecloud

    Storagecloud

    Robotictape

    library

    Digitalarchive

    Fedora

         WEWE

    ChisimbaArchon

    Private cloud infrastructureUse in establishing digital archive

         W

    EW

    E ru

    les

    Inge

    st

    Sou

    rce

    artif

    acts

    Dig

    ital

    conv

    ersi

    onRemote

    site

    Ingest

    Sourceartifacts

    Digitalconversion

         WEWE rules

    Remotesite

    Borndigital

    Docs

    Aud

    ioV

    ideo

    etcSOA layer

    OS: Open Solaris

    First tier storage

  •    

    Computecloud

    Storagecloud

    Robotictape

    library

    Digitalarchive

    Fedora

         WEWE

    ChisimbaArchon

    Private cloud infrastructureUse in establishing digital archive

         W

    EW

    E ru

    les

    Inge

    st

    Sou

    rce

    artif

    acts

    Dig

    ital

    conv

    ersi

    onRemote

    site

    Ingest

    Sourceartifacts

    Digitalconversion

         WEWE rules

    Remotesite

    Borndigital

    Docs

    Aud

    ioV

    ideo

    etcSOA layer

    OS: Open Solaris

    First tier storage

    Scanning &assembly

  •    

    #!/bin/bash#Scan in the pagesscanadf mode "Black & White" resolution 200

    #Convert each page to a pdf filedoconvert $file $file.pdfrm $filedone

    #Concatenate all the individual pdf files pdftk image*.pdf cat output $1.pdfrm image*.pdfmv *.pdf /home/$USER/monitored/outgoing/ .

    exit 0

    The real challengeis getting the documentscanned and into aPDF and sent off to somewhere meaningful.

    Thats why we needexpensive documentimaging software.

    Right?

  •    

    Let's have one digital asset management project for Wits and let us create the synergy 

    that leads to innovation.

  •    

  •    

    Attribution file: http://www.dkeats.com/usrfiles/users/   1563080430/attribution/attrib.txt

    http://www.dkeats.com/usrfiles/users/

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26WeWe BasicsWeWe DesignerWeWe DeveloperWorkflow ProcessSlide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38