A strategic view of document and digital object management
for the University of the Witwatersrand, Johannesburg
Prof Derek W. KeatsDeputy Vice Chancellor
(Knowledge & Information Management)
The University of the Witwatersrand, Johannesburg
http://[email protected]
Whataredocuments?
How does the computer'see' them?
Thestorageview
Themanipulationview
Thestructuralview
Theoperationalview
The
storage
viewThe
operational
viewThe
manipulation
viewThe
structural
view
Require software
that understands
the 'document' and
knows how to
present it.
The storage viewThe operational viewThe manipulation viewThe structural viewTimeTimeTime
The future
Today
Physical
deterioration
Digital
obsolescence
Accidental
damage
Loss of
metadata
Survival
Devices
File formats
A major threat to
proprietary
file formats
common inproprietary
systems
Today
Physical
deterioration
Digital
obsolescence
Accidental
damage
Loss of
metadata
Survival
Devices
File formats
Device obsolescence
File format
obsolescence
Software supporting the format fails in the marketplace or is bought by a competitor and withdrawn.
File format
obsolescence
Software upgrades fail to support legacy filesThe format itself is superseded by another or evolves in complexityThe format "take up" is low or industry fails to create compatible softwareThe format fails, stagnates, or is no longer compatible with the current environment
>
A small subset of commonly used media formats!
Media
If you don't have the software,
even a perfectly preserved document is of no use.
Digitization
Document
management
Born
digital
Digital
recovery
Digital archiving
Digital preservation
Analogue
Digital
Time
Digital
assets
Risk without long term planning
As a componentof how we manageour digital assets
Why digital asset management?
We are a knowledge organization
Knowledge workers spend 30-40%
of their time on document related tasks
This increases significantly when
other digital assets are taken into consideration
Digital assets are increasing and increasingly easy to lose
Digital assets form the basis of much of our researchAnd much more is possible
Digital archiving and preservation
Institutional papers and documents
Other digital assets
Historical papers
Library collections
Various history projects
Rockart collections
Video and audio collectionse.g. Wits TV
Donations of significant collections
from industry
History of human evolution research
Research output and theses
Research data
The curse of the
born-analogue
Social and semantic elements
CaptureCreateClassifyShareArchiveDestroyProtectRetainFind &usePreserveRoute
Creating semantic
and socially connected
document stores
archives
repositories
museums
herbaria
21st Century
ChisimbaSemantic and social 'X'
Fedora commons
Fedora commons
SWORD API
Chisimba
Fedora CommonsSWORD APIChisimba APIXMPP
eLearning'Portals'
Workflow
WEWE
Workflow
WeWe Basics
Rules-driven workflow engine
Rules represented in XML
Sequential event support
Conditional Return support
Written in Perl
Uses PostgreSQL Database
Open Source
Originally developed for The University of the Witwatersrand, Johannesburg
Multiple Management interfaces
WeWe Designer
Web-based design tool for designing workflows
Supports multiple events with multiple return types/states
Drag and drop interface
Written in JQuery
Open Source Interface
Adapt from Design Template support
WeWe Developer
Developers create Rules Modules
Modules can be written in Perl or any other language that can be executed from the Linux commandline
API
Commandline Interface
Workflow Process
Enterprise document
management
An approach using private cloud
Folder
serverWEWEChisimba
Private cloud infrastructure
SiteIngest
Born
digitalShared
folderNetwork
WEWE
Network
SiteSiteSite
Shared
folderWWWWEWEWorkflow managed by WEWE layer
Hosted
servicesDigital
archiveVirtualizationChisimbaFedora
Chisimba
OtherPrivate cloud infrastructure
WitsportalseLearningOS: Open Solaris
SOA layeremailZimbra
iRODSRemote
siteRemote
siteRemote
siteRemote
site WEWECompute cloudHierarchical storageRobotic
tape library
Spinning disks
Flash
memory
ComputecloudStorage
cloudRobotic
tape
library
Digital
archiveFedora WEWE
Chisimba
ArchonPrivate cloud infrastructure
Use in establishing digital archive
WEWE rulesIngest
Source
artifactsDigital
conversion
Remote
site
Ingest
Source
artifactsDigital
conversion
WEWE rules
Remote
siteBorn
digitalDocsAudioVideoetcSOA layerOS: Open SolarisFirst tier
storage
ComputecloudStorage
cloudRobotic
tape
library
Digital
archiveFedora WEWE
Chisimba
ArchonPrivate cloud infrastructure
Use in establishing digital archive
WEWE rulesIngest
Source
artifactsDigital
conversion
Remote
site
Ingest
Source
artifactsDigital
conversion
WEWE rules
Remote
siteBorn
digitalDocsAudioVideoetcSOA layerOS: Open SolarisFirst tier
storage
Scanning &assembly
#!/bin/bash#Scan in the pagesscanadf --mode "Black & White" --resolution 200
#Convert each page to a pdf filedoconvert $file $file.pdfrm $filedone
#Concatenate all the individual pdf files pdftk image-*.pdf cat
output $1.pdfrm image-*.pdf
mv *.pdf /home/$USER/monitored/outgoing/ .
exit 0The real challengeis getting the document
scanned and into a
PDF and sent off
to somewhere
meaningful.
Thats why we need
expensive documentimaging software.
Right?
Let's have one digital asset management project for Wits and let us create the synergy that leads to innovation.
Attribution file: http://www.dkeats.com/usrfiles/users/
1563080430/attribution/attrib.txt