email archiving systems interoperability the harvard ... · pdf filethe harvard library report...
TRANSCRIPT
![Page 1: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/1.jpg)
Email Archiving Systems Interoperability
(Article begins on next page)
The Harvard community has made this article openly available.Please share how this access benefits you. Your story matters.
Citation Simpson, Joel. 2016. Email Archiving Systems Interoperability.Harvard Library Report.
Accessed May 13, 2018 2:54:24 PM EDT
Citable Link http://nrs.harvard.edu/urn-3:HUL.InstRepos:28682572
Terms of Use This article was downloaded from Harvard University's DASHrepository, and is made available under the terms and conditionsapplicable to Other Posted Material, as set forth athttp://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
![Page 2: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/2.jpg)
Harvard Library ReportJuly 2016
Prepared by Joel Simpson
Email ArchivingSystemsInteroperability
![Page 3: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/3.jpg)
![Page 4: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/4.jpg)
TheHarvardLibraryReportEmailArchivingStewardshipToolsWorkshopislicensedunderaCreativeCommonsAttribution4.0InternationalLicense(CCBY4.0)<https://creativecommons.org/licenses/by/4.0/>
PreparedbyJoelSimpson,ArtefactualSystems,Inc.
ReviewedbyWendyMarcusGogel,HarvardLibraryandGrainneReilly,LibraryTechnologyServices,HarvardUniversity
Citation:Simpson,Joel.2016.EmailArchivingSystemsInteroperability.HarvardLibraryReport.http://nrs.harvard.edu/urn-3:HUL.InstRepos:28682572.
![Page 5: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/5.jpg)
Table of Contents
ExecutiveSummary..........................................................................................................................3
BackgroundandContext..................................................................................................................4
ProjectObjectives............................................................................................................................4
ProjectApproach..............................................................................................................................4
ProjectResults..................................................................................................................................5
1.AssessmentoftheEmailToolsDataSharingFramework....................................................5
2.AnalysisFramework:RequirementsforInteroperability.....................................................6
3.AnalysisofToolsusingtheRequirementsforInteroperabilityFramework.........................9
4.KeyFindings:AnalysisofToolsandEmailToolsDataSharingFramework........................18
5.OpportunitiestoImprovetheInteroperabilityofEmailTools...........................................20
Acknowledgements........................................................................................................................22
![Page 6: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/6.jpg)
Executive Summary
Earlierthisyear,HarvardLibraryconvenedtheHarvardEAST(EmailArchivingStewardshipTools)workshoptofostertheexpandingemailarchivingcommunity,sharebestpracticesandidentifydirectionsforfuturework.
Oneofthemainconclusionsoftheworkshopwasthatthereisnostandardworkflowthatcanbeuniformlyappliedineverysituation,butthatallarchiveshavesimilarfunctionalneedsforemailarchiving,andthatgiventheneedforflexibility,currentprocessescouldbeimprovedbyusingtheuniquestrengthsofdifferenttoolstogether.
HarvardLibraryengagedArtefactualSystemsInc.tobetterunderstandhowthetoolscanexchangedatatodayandcarryoutanalysistoidentifyopportunitiesforthecommunitytofurthersupportcomprehensivepreservationworkflowsforemail.
CommunitymembershavebeeninvitedtocontributetoanEmailToolsDataSharingFramework.Theintentionistoprovideahighlevelviewofhowemailcontentormetadatacanbeinputoroutputtoeachofthedifferenttools,usingacommonframeworktosupportcomparisonandanalysis.Thisworkisongoing,butenoughdetailhasbeencollectedtoenableanalysisandidentificationofsomeclearopportunitiesforimprovingtheinteroperabilityofthesetools.
Asetof“requirementsforinteroperability”wereidentifiedtosetoutthedifferentaspectsorconcernsinvolvedinusingmultipletoolsinanemailarchiving,processingorpreservationworkflow.Analysiswascarriedouttounderstandhoweachofthetoolssupportsthesedifferentrequirements.Keyfindingswerethenidentifiedineachoftheseareas.
Finally,asetof7draftrecommendationshasbeenproposedforthewidercommunitytoconsider.Thesearehighlevelrecommendationswithoutdetailednextsstepsoranysuggestionforpriority.Wefeeltheyareusefulindecomposingthiscomplexproblemspaceintodiscreteandwell-definedopportunitiesthatwillbeeasiertotackleinafastchangingenvironment.
Page 3 of 22
![Page 7: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/7.jpg)
Background and Context
Earlierthisyear,HarvardLibraryconvenedtheHarvardEAST(EmailArchivingStewardshipTools)workshoptofostertheexpandingemailarchivingcommunity,sharebestpracticesandidentifydirectionsforfuturework.Theworkshopinvolvedstakeholdersfromdifferentinstitutions,includingsubjectmatterexperts,usersanddevelopersofseveralemailarchivingorpreservationtools.
Theworkshopconcludedthatthecommunityisveryinterestedinworkingtogethertosolvesharedproblems.Severaldirectionsforfutureworkwereidentified,including“theneedforanexchangestandardthatenablesinteroperablewaystoextract,packageandtransferdatabetweentools”.Thisconclusionwasbasedontheconsensusthatthereisnooneuniformworkflowforemailarchiving,butthatcurrentprocessescouldbeimprovedifarchiveswereabletoharnesstheuniquestrengthsofeachtoolselectively(usingonlythefunctionalityneededinwhateverorderisneeded).
HarvardLibraryPreservationServicesengagedArtefactualSystemsInc.tocarryoutashortconsultingprojecttobuildonthesefindingsandidentifyopportunitiesforthecommunitytofurthersupportcomprehensivepreservationworkflowsforemail.
Project Objectives
Thegoalsofthisconsultingprojectareto:
1. identifygapsoropportunitiestoimprovetheinteroperabilityofthenumerousemailtoolsbyshowingthetype,formatandstructureofdatawhichcanbeinputoroutputfromeachtool
2. informemailstewardsabouttheoptionsandconsiderationsinvolvedindefiningemailarchivingworkflowsusingmultipletools
Thisprojecthasnotattemptedtoprovideafunctionaldescriptionorcomparisonofthevarioustoolsunderconsideration.Averybriefoverviewofthetools,withlinksforfurtherdetailedinformationavailablefromtheproviders,isprovidedbelowinsection3.AusefulcomparisonofEmailArchivingtools(includingmanynotconsideredinthisproject)canbefoundattheLifecycleToolsforArchivalEmailChart:https://docs.google.com/spreadsheets/d/1V1N22xnr5e0EbDlZWx58bjYO6rkrMrYH9wGX9-CK8c4/edit#gid=986222267.
Project Approach
Thisprojectisproducingtwodeliverablestomeettheobjectivesdefinedabove.
ThefirstdeliverableisanEmailToolsDataSharingFrameworkthatsetsoutthecontentobjects(i.e.email)andmetadatathateachemailorpreservationtoolcaninputoroutput.Representativesfromeachtoolproviderwereaskedtocompletethedescriptionsoftheseinputsandoutputsusingagenericframework(withassociatedglossary)toenablecommonunderstandingoftermsandmakecomparisonbetweentoolseasier.
Amoredetaileddescriptionandassessmentofthetoolisprovidedbelowinsection2.
Page 4 of 22
![Page 8: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/8.jpg)
TheseconddeliverableofthisprojectisthisConsultingReportwhich
1. assessesthecompletionandusefulnessoftheEmailToolsDataSharingFramework2. proposesagenericsetofrequirementsforinteroperabilitytouseasananalysisframework3. analyzes/summarizeshoweachtoolsatisfiesthoserequirementsforinteroperability4. setsoutseveralrecommendationsforimprovinginteroperabilityofthetoolsandfurther
establishingbestpracticesforthecommunityPleasenotethatthroughoutthisreportwhenwereferto‘digitalobjects’wemeananytypeofdigitalobjects,includingemailsthemselves,relatedcontentlikeattachments,oranyassociatedmetadata.Weuse‘data’interchangeablywith‘digitalobjects’simplybecauseitisshorter.(Wehavenotseentheneedtodistinguishtheseconceptswithmoreprecisedefinitions.)
Project Results
1. Assessment of the Email Tools Data Sharing Framework
1.1. About the Email Tools Data Sharing Framework
Theemailtoolsdatasharingframeworkincludesinformationon6differentemailorpreservationtools.Theintentionistoprovideahighlevelviewofhowemailcontentormetadatacanbeinputoroutputtoeachofthedifferenttools.
Theframeworkissetoutinaspreadsheet,withonesheettodescribeinputsandanothertodescribeoutputs.Eachsheetisorganizedtofirstdescribetheactual(or"physical")dataobjects(orinput/outputmechanisms,asinsomecasestheyareprogrammatic),followedbyadescriptionofthekindsofdataormetadatafoundinthoseobjects.
Separaterowsdistinguishbetweenthelevelofobligationdemandedtobeabletouseeachtool:
● mandatorycontentordata(systemwillnotacceptorworkproperlywithoutthis)● usefulcontentordata(isoptional,butenablesfunctionalitywithinthesystem-e.g.asensitivity
flagthatcanbeusedwhenfiltering)● additionalcontentordata(canbeconsumed,butisnotusedinanywaybyconsumingsystem--
e.g.attachmentsareincludedinMBOX,buttheparticularsystemmaynotallowuserstodoanythingwiththem)
Thegoalistodescribeineachofthesecolumns:
● thetypeorextentofdataprovided(e.g.specificfieldsusedasreferenceIDs,oramoregeneraldescriptionsuchas'preservationevents')
● formatofdata(isa'local'schemadefined,orisastandardschemaused,suchasPREMIS)● location/structureofdata(whereintheinput/outputisthisinformation--e.g.PREMISevents
arerecordedinMETS.xmlfile;folderinformationstoredinpathnameinMBOXetc.)Insomecasesthisinformationneedstobebrokendownintodifferentlevelsofgranularity,forinstancetoindicateinformationstoredatindividualemaillevelvs.collectionlevel.
Page 5 of 22
![Page 9: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/9.jpg)
1.2. Assessment of the Email Tools Data Sharing Framework
Atthetimeofthiswriting,completionofthespreadsheetisinprogress.Weinvitecommentsorthoughtsfromallparticipantson:
● abilitytocompletethespreadsheetconsistently(orkeydifferencesininterpretation)● anythinglearnedwhilefillingitin● whetheritiscompleteenough,orneedsfurtherwork;wishlistadditions/amendments(e.g.
suggestionsforaddingmoredetail)● initialviewsonvalueoftheexercise● intenttousethetoolmovingforward
Datagatheringworkisongoingandwillberefinedasneededbythecommunitytosupporttheircollaborativeeffortstoimprovethesetoolsandestablishbestpracticesforemailarchivingandpreservation.
InitialfeedbackandobservationsfromArtefactual:
● Itisinterestingtoseethisparticularperspectivefromthedifferenttools,andenablesinterestinganalysisofsimilaritiesanddifferences(whichwillbeexploredfurtherintherestofthisreport).
● Thespreadsheetemphasizestwodimensions(datatypesincolumnsandsystemsinrows),butthereareinfactnumerousdimensionsofinterest(includinggranularityofgroupingofdata,levelsofobligation,typeofdatavs.formatsorstandardsemployed,etc.).Thismakesfittinginalloftherelevantinformationachallenge.
● Giventhespace,itdoesnotseempossibletoincludeenoughdetailedinformationforthistobeaveryhandson‘howto’tool--butitmaywellbeausefulanalyticordecisionsupporttool,todetermineifthereisenoughcompatibilitybetweenaparticularselectionoftoolsforadesiredworkflow.
2. Analysis Framework: Requirements for Interoperability
Thedatasharingframeworkisprimarilyfocusedontheinputsandoutputsofeachofthetoolsunderconsideration.Giventhebroaderintenttoenableemailstewardstodeterminewhetherandhowtheymightcraftworkflowsusingmultipletools,thisreportproposesasetofgeneric‘requirementsforinteroperability’.Thisprovidesamoreholisticviewofthedifferentaspectsofusingmultipletoolsthatoperatetogethertoenableacomprehensiveworkflowforemailprocessingorpreservation.
Theserequirementsaremoreananalyticalframeworkthanaconcretesetofrequirements.Theyarefocusedonthelevelofbusinessprocessesandworkflows,anddonotrepresentaparticularefforttoelicitrequirementsfromendusers.
Therequirementsandtheirrationalearedescribedbelow.Inthefollowingsection,eachofthe6toolsisassessedagainsteachrequirement.Thisallowsustocomparesimilaritiesanddifferencesinspecificareasofconcernandusethisasthebasisforrecommendationsforfutureworklaterinthereport.
Page 6 of 22
![Page 10: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/10.jpg)
2.1. Support for data transmission
Themostbasicrequirementforaworkflowthatusesmultipletoolsworkingonacommonsetofdataistoenablethosetoolstoaccessthatdata.
Thisfunctionalitycanbeprovidedinmanyforms;userinterfacesforselectionofdataforingestfromaparticularlocation;automatedjobsthatingestdata;directsystemtosystemconnectivity;orpublishedAPIs.Thegoalhereistosimplyarticulatehoweachsystemsupportsthis,ratherthantojudgeonemethodoveranother.Thiswillallowustoseewhichtoolscansharedata(andhow),ataphysicallevel,withothertools.
2.2. Support for standard data formats
Oncewehavedeterminedaparticulartoolcanaccessasetofdataphysically,weneedtoensureitcaninterpretandprocessthatdata.Ataminimum,thedataformatmustbe‘standard’betweenthetoolsbeingconsidered.
Itiswellestablishedinthepreservationcommunitythatopen,non-proprietaryandwidelyusedstandardsarepreferableforpreservationformats.Whilenotalldatatobeexchangedneedstobe(orevencanbe)inapreservationformat,thesameprincipleswillimprovetheoddsthatanyparticulartoolwillbeinteroperablewithothers.
Supportforstandarddataformatsappliestoemailcontent,metadataandthepackagingofbothemailandmetadata.
2.3. Support for appropriate scope of exchangeable data
Emailcontentandmetadatacanexistorbegroupedatvariouslevelsofgranularity.Differentprocessingtoolsmayacceptdatawithanentirelyarbitrarydefinitionofscope(usingagenerictermsuchasa‘transfer’or‘packet’),ortheymayrequiredataormetadatatoconformtoaspecificdefinition(suchasclearlygroupingdataby‘account’).
Scopeofdataalsoreferstothetypeandextentofdatainanyparticulardataset.Forexample,Archivematicahasfunctionalitytoverifyhashes/checksums;ifchecksumshavebeencreatedinanothertool(e.g.BitCurator),thenideallyArchivematicashouldallowchecksumstobeimportedsothatverificationcanoccuronthosechecksums,notjustonchecksumscreatedbyArchivematica.Thisconceptisclearlytiedcloselywiththelevelofgranularity-achecksummaybemadeforafolderorcollectionofemails,oritmaybecreatedattheindividualemaillevel.
Emailstewardswillneedtounderstandwhatscopeofdataisrequiredorpossibleusinganyparticulartool.Similarlyanydecisiontouseaparticulardatastandardneedstoconsiderthescopeofdatathatformatallowsfororrequires.
Page 7 of 22
![Page 11: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/11.jpg)
2.4. Ability to track processing history and provenance
Theabilitytoestablishandmaintaintheprovenance(includingprocessinghistory)ofcontentisawellunderstoodrequirementinthearchivalandpreservationcommunities.Whilethismaynotbearequirementforeveryonelookingtoprocessemails,itisafundamentalrequirementforthecoreusergroupsofmanyofthe6toolsweareevaluating.
Emailstewardswhodoneedtorecordandcaptureprovenancewillgenerallyneedamechanismtodothiswhenevertheyareprocessing,creatingorchangingdata.Thismeansthateitherthetoolstheyuseforprocessingneedtocaptureprocessinghistorydirectly,ortheyneedsomeabilitytotrackprocessinghistorymanuallyandstoreitappropriately.
2.5. Support for maintaining the identity and integrity of data
Asdataismoved,migratedorprocessedbydifferenttools,emailstewardsneedtobeabletoensurethattheidentityandintegrityofthedatatheyareprocessingisnotcompromised.
Maintainingtheidentityofthedatasetdependsinlargepartuponusingidentifierstolinkittoitsdescriptiveandadministrativemetadata,andensuringthatthislinkcannotbebroken.Mosttoolsgenerateuniqueidentifiers,buttheseareusuallylocal(assigned,storedandmaintainedwithinthetoolitself).Externalidentifiersmaybesupported,eitherinformally(e.g.byrecordinganaccessionnumberaspartofadirectorystructureorfilename)ormoreformally(asinhavingafieldwithadeclareddatatypethatalignstotheidentifierusedbyanothersystem).Somesystemsalsosupportidentifiersthatreferexplicitlytoexternalresourcesorauthorities(aconceptunderpinninglinkeddata).
Maintainingtheintegrityofdigitalobjectsisoftenachievedusinghashesorchecksums,withregularverification,toensurethatthecontentoftheingesteddatahasnotbeenalteredovertime.Thehashesorchecksumscanbeassignedtoboththeoriginalingestedcontentandtoanynormalizedorotherwisemodifiedversionsthatmaybegeneratedfromthatcontent.Hashesorchecksumsmayalsobeassignedtoassociatedmetadata.
Anothercommonpracticetosafeguardtheintegrityofdataistopackagecontentandmetadata‘together’fortransfer,reducingtheriskofcorruptionorloss(i.e.linksbetweenthetwobreakingatsomepoint).
2.6. System access and documentation to support interoperability
Abasicrequirementistheabilitytoaccessandusethesoftware,bothtechnicallyandwithappropriatepermissionsorlicensing.
Allofthecapabilitiesmentionedabovearelessusefulinpracticeifknowledgetousethemisnotcapturedwell.Technicalanduserdocumentation,trainingmaterialsandtrainingresources(i.e.trainersforhire)alladdtotheabilitytousethetoolaspartofanintegratedworkflow.Thestartingminimumisdocumentationonhowtousethetoolatall.Ideallyaknowledgebasewouldaddresstheexchangeof
Page 8 of 22
![Page 12: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/12.jpg)
data,interoperabilitywithothersystemsandanylicenserequirements.
3. Analysis of Tools using the Requirements for Interoperability Framework
3.1. Archivematica
Archivematicaisanintegratedsuiteofopen-sourcesoftwaretoolsthatallowsuserstoprocessdigitalobjectsfromingesttoaccessandtoimplementpreservationplans.Usersmonitorandcontrolingestandpreservationmicro-servicesviaaweb-baseddashboard.ArchivematicausesMETS,PREMIS,DublinCore,theLibraryofCongressBagItspecificationandotherrecognizedstandardstogenerateArchivalInformationPackages(AIPs)forstorageinexternalrepositories.
Requirement SupportingFunctionality Observations
Supportfordatatransmission
Digitalobjectsneedtoresideinalocallyaccessiblefilesystemforingest.ArchivematicaisprovidedwithanaccompanyingapplicationcalledStorageServicesthatcanbeusedtoconfigureaccesstosourcesofdataforingest.ThereisanAPItoassignaccessionnumbers,butnodirectsupportformovingdataacrosshardware,networksetc.
Therearenumerousexternaltoolsavailableformovingdata.
Supportforstandardformats
Anydigitalobjectcanbeingested,soanyemailformatcanbeprocessedwithcorefunctionality.EmailinputinMBOXformatcanbeprocessedusingadditionalfunctionality(extractingattachmentsandmetadata).EmailinputinmaildircanbenormalizedandoutputasMBOX.TheBagItfilepackagingstandardissupportedforinputandoutput.Metadatainputincsvorjsonformatscanbeprocessed.Additionalmetadata(inotherformats)canbeincludedbutnotprocessed.Metadataoutputsarewellsupportedbywidelyadoptedstandards(METS,DublinCore,PREMIS,Bag)
NosupporttonormalizetoEMLformat(widelyusedemailformat).
Supportforappropriatescopeofdata
Transfer,Submission,ArchivalandDisseminationpackagescanbestructuredanddescribedusinganydefinitiontheuserchooses.Forexample,anemailaccountoraccountscanbeingestedasoneormoreSIPs,andmultipleSIPscanbecombinedintooneormoreAIPs.Somekeymetadata,suchasrightsmetadata,canonlybeinputorassignedduringprocessingatthepackagelevel.
Providescompleteflexibilitybutnonativesupportforcommonemailgroupings(e.g.account,folderetc.)Rightsmetadatacan’tbeassignedtoindividualemails,souserswouldhavetomanuallystructureinputsandoutputstoreflectdifferentrights(e.g.createoneAIPorDIPforrestrictedemails,andonefor
Page 9 of 22
![Page 13: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/13.jpg)
non-restrictedemails).
Abilitytotrackprocessinghistoryandprovenance
ProvidesextensivefunctionalitytotrackprocessinghistoryandrecordusingPREMISProcessinghistoryfromexternalsourcescould“travelwith”anydatasets,butcurrentlynoabilitytomergeorconsolidateprocessinghistoryfrommultiplesystems.
Emailstewardscouldcreatemanualprocessestomaintainmultipleprocessinghistoryfiles.
Supportformaintainingtheidentityandintegrityofdata
ArchivematicaassignsUUIDstoallingestedobjectsandusestheUUIDsandIDattributesintheMETSfilestomaintainlinksbetweendigitalobjectsandtheirmetadata.Archivematicaalsosupportsawiderangeofexternalmetadata,sothereareseveralwaysexternalidentifiers(i.e.fromothertools)canbemaintained.Howeverthereisnodirectsupportfortyped/declaredexternalidentifiers(e.g.automaticallyaddingidentifierswhenimportingfromanexternalsystem).Fixityverificationissupportedusingbothinternallyorexternallycreatedhashes.
Emailstewardscouldcreatemanualprocessesforaligningandmaintainingreferentialintegrityacrosssystems(butmayneedtoplanthis-e.g.aligningpackagestructuretoexternalidentificationsystems)
SystemAccessandDocumentation
Documentationavailable,communitysupportwebsite/groups,aswellasforhireservicesforconsultancy,trainingetc.SourcecodeandtechnicalinfoavailableonGitHub.Documentationcanbequitetechnical.
3.2. ArchivesSpace
ArchivesSpaceisanopensource,webapplicationformanagingarchivesinformation.Theapplicationisdesignedtosupportcorefunctionsinarchivesadministrationsuchasaccessioning;descriptionandarrangementofprocessedmaterialsincludinganalog,hybrid,andborn-digitalcontent;managementofauthorities(agentsandsubjects)andrights;andreferenceservice.Theapplicationsupportscollectionmanagementthroughcollectionmanagementrecords,trackingofevents,andagrowingnumberofadministrativereports.Theapplicationalsofunctionsasametadataauthoringtool,enablingthegenerationofEAD,MARCXML,MODS,DublinCore,andMETSformatteddata.
(summary taken from: https://archivesspace.atlassian.net/wiki/display/ADC/ArchivesSpace)
ArchivesSpaceisnotadigitalassetordocumentmanagementsystemandcannotmanagedigitalfilesordigitizationworkflows.Thedigitalobjectsmodulecanbeusedtodescribedigitalobjectsandlinktodigitalfilesstoredelsewhere.ThemetadatacreatedcanbeexportedtoothersystemsasMODS,METS,orDublinCoreormadepubliclyaccessiblethroughthebuilt-inpublicinterface,thoughtheviewersin
Page 10 of 22
![Page 14: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/14.jpg)
thepublicinterfacearemorelimitedintheirfunctionalitythanthoseofadigitalassetmanagementsystemordigitalrepository.
(detailondigitalobjectstakenfromFAQ:http://www.archivesspace.org/faq)
Requirement SupportingFunctionality Observations
Supportfordatatransmission
ArchivesSpacedoesnotprovideameansofmovingorstoringemailcontent.MetadatacanbeexchangedasfilesorthroughasetofAPIs.
Supportforstandardformats
ArchivesSpacesupportsarangeofwellestablishedstandardsfordescribingarchivalrecords-EAD,MARCXML,MODS,DublinCore,andMETSformatteddata.ArchivesSpacedoesnotsupportfunctionalityorprocessingofemailcontent(i.e.normalisation,searchoridentificationofauthoritiesetc.)
Supportforappropriatescopeofdata
ArchivesSpaceprovidesfunctionalityfordescribingthearrangementandrelationshipsofdigitalobjects.Itdoesnotsupportemailspecificconceptsdirectly(e.g.thenotionofanemailaccount)
Itcouldbeusefultoestablishconventionsorbestpracticesfordescribingemailaccountsandtheirpotentialrelationshipstocollections,agentsetc.
Abilitytotrackprocessinghistoryandprovenance
Supportformaintainingtheidentityandintegrityofdata
Supportforidentifiersandintegrityinternallywithinarepository.Thesystemsupportsstructuredcaptureofagentsandsubjectswhichwillimproveconsistencyandaccuracyofdescription
SystemAccessandDocumentation
ArchivesSpaceisanopensourceprojectwithconsiderabledocumentationavailable.ItissupportedbytheLyrasisorganisationwithfulltimestaffwhoaredevelopersandsubjectmatterexperts.
3.3. BitCurator
TheBitCuratorEnvironmentisbuiltonastackoffreeandopensourcedigitalforensicstoolsandassociatedsoftwarelibraries,modifiedandpackagedforincreasedaccessibilityandfunctionalityfor
Page 11 of 22
![Page 15: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/15.jpg)
collectinginstitutions.TheBitCuratorsoftwareisfreelydistributedunderanopensourcelicense.ItcanbeinstalledasaLinuxenvironment;runasavirtualmachineontopofmostcontemporaryoperatingsystems;orrunasindividualsoftwaretools,packages,supportscripts,anddocumentation.
KeyfeaturesofBitCuratorinclude:
● Pre-imagingdatatriage● Forensicdiskimaging● Filesystemanalysisandreporting● Identificationofprivateandindividuallyidentifyinginformation● Exportoftechnicalandothermetadata
(summarytakenfrom:http://www.bitcurator.net/bitcurator/)
Requirement SupportingFunctionality Observations
Supportfordatatransmission
BitCuratordoesprovidesupportformigratingdatawithoutalteringitinanyway,startingwiththeconceptofcreatingforensicimagesbeforefurthertransmittingorprocessingdata.Uniquelyamongthetoolsconsideredhere,BitCuratorprovidessoftwarewrite-blockingfunctionalitytoensuretheintegrityofsourceobjects.
Asthisisanareanotwellsupportedbyothertools,itcouldusesomeelaboration/detail.
Supportforstandardformats
SupportsDFXML(DigitalForensicsXML)thatenablestheexchangeofstructuredforensicinformation.BitCuratorgeneratesPREMISmetadatawhentheuserrunsseveralofitscoredataforensicstools,providingarecordofkeyprocessingevents.Providessomeprocessingsupportforemail-e.g.usingreadpsttoconvertPSTemailobjectsintoMBOX.AlsosupportsBAGformatforoutput.
Supportforappropriatescopeofdata
TheBitCuratorenvironmentincludesnumerousapplicationstobeusedfordifferentpurposes,toberunagainstindividualitemsorcollectionsofterms.Oneofthemostcommonlyusedtoolsisbulk_extractor,whichcanbeusedtoidentifypotentiallysensitiveinformationondisks,diskimagesordirectories.Othercoretools,includingfiwalkandotherspecializedreportingtools,aredesignedtoberunagainstentirediskimages.Whenrunagainstadiskordiskimage,bulk_extractorreportsonthelocationofpatternsbasedabyteoff-setontothedisk.Otherreportingtools,includingfiwak,generatemetadatabasedonthefilesystem(filesandfolders).Inthecaseofemail,thefileswouldbelikelyinformatssuchas.pstormbox.Thosewishingto
Page 12 of 22
![Page 16: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/16.jpg)
generatemetadataassociatedwithspecificmessageswithinthosecontainerfilescouldusereadpstandpipeitsoutputtoothercommand-linetools.BitCuratorisprimarilyconcernedwithidentificationanddescriptionofdigitalobjectsratherthanarrangement.
Abilitytotrackprocessinghistoryandprovenance
BitCuratorgeneratesPREMISmetadatawhentheuserrunsseveralofitscoredataforensicstools,providingarecordofkeyprocessingevents.
Emailstewardscouldcreatemanualprocessestomaintainmultipleprocessinghistoryfiles.
Supportformaintainingtheidentityandintegrityofdata
BitCuratorprovidessupportforindexing,characterizinganduniquelyidentifyingallcontentonadiskordiskimage.Bitcuratorsupportscreationandvalidationofhashes/checksums.
SystemAccessandDocumentation
BitCuratorisanopensourceprojectwithconsiderabledocumentationavailable.
3.4. DArcMail
DArcMail(forDigitalArchiveMailSystem)wascreatedbytheSmithsonianInstitutionArchives.DArcMailprovidesnormalization,itemlevelandbulkprocessing,intellectualarrangement,searchcapability,packagingandaccessfunctionalityforemail.
Requirement SupportingFunctionality Observations
Supportfordatatransmission
Digitalobjectsneedtoresideinanaccessiblefilesystemforingest.
Supportforstandardformats
EmailinputrequiresMBOXastheoriginalformatorasaninterimnormalizationformat.EmailinputinMBOXformatcanbeprocessedwithallcorefunctionalityincludingexportingpreservedemails,emailcollectionsoremailaccountsintheEMailAccountXML(EMA).EMAisacomprehensiveXMLschemadesignedforRFC5322compliantpreservationpurposesappliedtothefullrangeofemailobjects,i.e.,singlemessagetowholeemailaccount.AllelementsoftheoriginalemailisretainedinthepreservationEMAXMLoutput.User-definedsubsetsofemailmessagescanbecreatedandexportedinMBOXorEMAXMLformats.
NosupporttonormalizetoEML.TheEMAXMLschemaisnotwidelyadopted.Itisfullyimplementedintwootheremailarchivingtools,orinlimitedfashioninacoupleotherapplications.
Page 13 of 22
![Page 17: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/17.jpg)
Supportforappropriatescopeofdata
DArcMailallowsuserstointeractwithemailsonanindividual,grouporaccountbasis.Complexsearching,filteringandmessagethreadtracking.Attachmentscanbesearched,viewedandseparatedfromemail.
Abilitytotrackprocessinghistoryandprovenance
TheDArcMailtoolisdesignedtobeusedforinitialappraisalandthenforpreservation(AIP)andaccess(DIP).ItnativelyretainsthelogicalarrangementoftheoriginalaccountinboththeAIPandDIPpackages.ItsflexibilityallowsforcreationofcustomsubsetsofemailforcreationofspecializedAIPsandDIPs.
TransferandaccessioningofemaildigitalobjectsoccuroutsideoftheDArcMailworkflow.Non-technicalmetadatasuchasrightsmetadatamustbecapturedandmaintainedinaseparatesystemormanually.
Supportformaintainingtheidentityandintegrityofdata
DArcMailmaintainsallUIDspresentintheoriginalemails.ItgeneratesSHA-1checksumsforeachmessageandforemailaccountsasawholewhichareembeddedintheEMApreservationformat.DArcMailalsoproducesexternalmetadataincludingthechecksumforeachmessagepreserved.
Theinternalmessageandaccountchecksumsareretainedevenifthepreservedemailaccountismovedtofromonerepositorytoanother.
SystemAccessandDocumentation
DArcMailisnotcurrentlyavailableoutsideoftheSmithsonian.Limiteddocumentationispubliclyavailable.TheSmithsonianintendstoreleaseitasopensourcewhentime/effortallows.
Makingthetoolpubliclyavailableisapreconditionforanyothercommunityusers.
3.5. Electronic Archiving System (EAS)
HarvarddevelopedtheEAStooltoenablearchivalprocessingofemailmessagesandattachmentsandautomatetheprocessofmakingdepositstoHarvard'spreservationrepository.Keyfeaturesinclude:
● NormalizationtoEML--anopenstandardforpreservation(anextensionofIMFRFC5322)--forlongtermpreservation.
● Summaryviewsofthemetadataassociatedwithemailorattachmentswithinaresultset.
● Batchanditemlevelprocessingoptionsforarchivists.
● Longtermpreservationofemailandattachmentsinasecureenvironmentapprovedforsensitivedataissupportedbyautomatedpackagingandtransfertothepreservationrepository–DigitalRepositoryService(DRS).
● CaptureofessentialrightsmanagementinformationusingPREMIS.
Page 14 of 22
![Page 18: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/18.jpg)
● CaptureofsignificanteventstrackingtodocumentdeletionsofemailandattachmentsandformattransformationssuchastheconversionofthenativemailformattoEML.
(featurelisttakenfrom:http://hul.harvard.edu/ois/systems/eas/)
Requirement SupportingFunctionality Observations
Supportfordatatransmission
Dataneedtobemovedtoa‘dropbox’(directoryspaceinHarvardsystems).EASdocumentationdescribeshowtouseasecureFTPclienttomovethedatabutthisisnotpartoftheEASsolution.
Therearenumerousexternaltoolsavailableformovingdata.
Supportforstandardformats
EmailcontentcanbeinputinMBOXorPSTformat(whichcoversthemajorityofemailclientstandardsforoutputofemail).Attachmentobjectsofanytype(e.g..ppt,.doc)canbeembeddedintheemailsorprovidedseparately.Itisnotpossibletoinputmetadata(beyondthatcontaineddirectlyinMBOX/PSTorattachmentformats).EmailisoutputtoEMLformat,withattachmentsextracted.Overallmetadataiscapturedandoutputusingwellestablishedstandardformats(e.g.METSandMODS)andbothrightsandprocessinghistoryarecapturedinPREMIS.SomereferencemetadataisinlocalformatdefinedbyHarvard(forpackets,collectionsetc.),asismetadatarelatingtosecurity(access)andsensitivity(usinglocallydefined‘flags’).
Emailcontentformatswellsupported.WhileEMLformatforoutputisawellestablishedstandarditisnotacceptedbyallothertoolsforinput.Securityandsensitivitymetadatacouldpotentiallybecapturedusingmorewidelyusedstandard.ReferencingmetadatagearedtowardsHarvardintegrationwithDRSsystem.Maynotbeanyneedtostandardizethis,butsupportforexternalIDswouldenablebetterinteroperabilitywithothertools.
Supportforappropriatescopeofdata
Submissionpacketscanbestructuredanddescribedusinganydefinitiontheuserchooses.Itisnotpossibletoinputadditionalmetadataorcontentbeyondemail/attachments.Processingworkcanbecompletedatindividualitemlevel(emailorattachment)oratvariouslevelsofgrouping(folder,collectionetc.).Additionalgroupingscanbeadded(collectionsorseries).Outputswillalwayscontainthesamepacketstructureastheassociatedinput.Outputcontainsnormalized/processedcontent;doesNOTcontainoriginalinputfiles(i.e.inMBOXorPSTformat)
Providessupportforgrouping(incollectionsetc.)Inabilitytoinputadditionalmetadataorcontentsuggeststhistoolmayworkbestat‘start’ofaworkflow.Stewardswillneedtothinkthroughmanualprocessesformanagingmetadatacreatedusingothertools.
Abilitytotrackprocessinghistoryandprovenance
ProvidesfunctionalitytotrackprocessinghistoryandrecordusingPREMIS.Noabilitytomergeprocessinghistorywiththatfromothertools.
Emailstewardscouldcreatemanualprocessestomaintainmultipleprocessinghistoryfiles.
Page 15 of 22
![Page 19: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/19.jpg)
Supportformaintainingtheidentityandintegrityofdata
Identifiersareinternal(e.g.EASmessageID)orlocaltoHarvard(e.g.DRScodesareforHarvardrepository).Integrationwith‘Wordshack’applicationensuressomedescriptiveoridentificationinformationisbasedoncontrolledvocabulariesusedinHarvard(i.e.alsointegratedwithHarvardDRSrepository).Thisimprovesconsistencyinuseofadmincategoriesandtopics,andimprovesidentificationqualityforpersonsororganisations.
Supportforexternalreferencingsystemswouldbetterenablemulti-toolworkflows.UseofcontrolledvocabularieslimitedtoHarvardcurrently-couldbeseveralapproachestoextendthis-e.g.publishingthosevocabulariesasopendata,orenablinguse/integrationofother(e.g.linkedopendata)vocabulariesasalternatives
SystemAccessandDocumentation
UserdocumentationavailableandsupportforHarvardusers.SystemisnotcurrentlyavailablebeyondHarvardusers.
AprojecthasbeenproposedtoreleasesystemasOpenSourceproject;butsometechnicalworkrequiredtomakereadyformoregenericuse.
3.6. ePADD
ePADDisasoftwarepackagedevelopedbyStanfordUniversity'sSpecialCollectionsandUniversityArchivesthatsupportsarchivalprocessesaroundtheappraisal,ingest,processing,discovery,anddeliveryofemailarchives.Theuserguide(https://docs.google.com/document/d/1joUmI8yZEOnFzuWaVN1A5gAEA8UawC-UnKycdcuG5Xc/edit#)providesthefollowingdescriptionofthemajormodulesinthesystem:
Appraisal:Allowsdonors,dealers,andcuratorstoeasilygatherandreviewemailarchivespriortotransferringthosefilestoanarchivalrepository.
Processing:Providesarchivistswiththemeanstoarrangeanddescribeemailarchives.
Discovery:Providesthetoolsforrepositoriestoremotelysharearedactedviewofemailarchiveswithusersthroughawebserverdiscoveryenvironment.
Delivery:Enablesarchivalrepositoriestoprovidemoderatedfull-textaccesstounrestrictedemailarchiveswithinareadingroomenvironment.
Requirement SupportingFunctionality Observations
Supportfordatatransmission
Theappraisalmodulewillacceptemailfilesdirectly(fromalocalfilesystem)andalsohastheabilityconnectdirectlytoemailserverstodownloademailusingIMAP.Othermodulesrelyonoutputs(files/directories)fromotherePADDmodules(i.e.appraisaloutputis
Therearenumerousexternaltoolsavailableformovingdata.Theabilitytoconnectdirectlytoemailserverisuniqueandsimpleifonlytransportingemailcontent(i.e.noadditional
Page 16 of 22
![Page 20: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/20.jpg)
inputtoprocessingmodule,processingmoduleoutputisinputtodiscoverymoduleetc.)
content/metadata).
Supportforstandardformats
EmailcontentcanbeinputinMBOXorbydirectlyconnectingtoemailserver(thereforeexcellentsupportifonlyinterestinginingestingemailcontent).Itisnotpossibletoinputothercontent(attachments)orMetadata(beyondthatcontaineddirectlyinMBOXformat).EmailisoutputtoMBOXformat.AttachmentsareNOTextractedseparately.Metadatathatlinkscorrespondents,people,organisationsorlocationstoexternalauthorities(e.g.LCSubjectHeadings)canbeoutputwithURIsthatrepresenttheentitybytheexternalauthority.
Whiletheformatforwrappingmetadataappearstobenon-standard,theprocessforassigningthemetadataformanydescriptiveelements(correspondent,locationetc.)usesexternalauthorities(linkeddata)whicharewellestablishedstandardsforthosespecificvocabularies.
Supportforappropriatescopeofdata
ePADDingestsmaterialstructuredaroundaparticularpersonwhomayhavemorethanoneemailaccount.Itdoesnotappeartoofferthewiderflexibilityofallowinguserstoentertheirownarbitrarilydefined‘packets’.Itisnotpossibletoinputadditionalmetadataorcontentbeyondemail/attachments.Processingworkcanbecompletedatindividualitemlevel(emailorattachment)oratvariouslevelsofgrouping(folder,collectionetc.).Additionalgroupings,suchascollectionsorseries,canbeadded.Scopeofoutputscanvaryasuserscanselectindividualemailstoincludeorexclude.Onlydescriptivemetadatacanbeoutput(butnothingforrights,sensitivity,processinghistoryetc.)ePADDallowsforthere-useorsharingoflexiconfilesforentityanalysis.Lexiconfilesenablefulltextsearchingonarangeofdifferentterms,enablingstewardstoconductcomplextieredsearches.
Metadatacan’tbeinputwithemailcontent.Metadatacan’tbeoutputexplicitly,butisusedinprocessingsostewardscoulddefineworkflowsthatenablethemtoaligntothesemanually.forexample,thecartfunctionalitycanbeusedtoselectonlyemailswithacertainrightsvalueforoutput;thenrepeatforothervalues,creatinganMBOXoutputfileforeachmetadatavalue.
Abilitytotrackprocessinghistoryandprovenance
Notavailablecurrently.
Asnotedabove,couldbesomescopeformanuallyoutputtingdatathatisgroupedaroundaparticularprocessing‘event’-butnodirectsupportformaintaining,muchlessmerging,processinghistory.
Page 17 of 22
![Page 21: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/21.jpg)
Supportformaintainingtheidentityandintegrityofdata
Identifiersareinternal(e.g.ePaddmessageID)IntegrationwithexternalauthoritiessuchasLCSubjectHeadings(FAST)ensuresconsistencyandimprovesaccuracyinapplyingdescriptivemetadata.
Supportforexternalreferencingsystemswouldbetterenablemulti-toolworkflows.LinkedopendataapproachfordescriptivemetadataisuniquetoePADDbutcouldbehelpfulifadoptedbyothertools.
SystemAccessandDocumentation
Userdocumentationavailable;technicaldocumentationandcodeavailableonGitHub.
4. Key Findings: Analysis of Tools and Email Tools Data Sharing Framework
Thissectionsetsoutanalysisandfindingsforeachofthe‘requirementsforinteroperability’basedonourunderstandingofthecapabilitiesavailableacrossallofthetoolstoday.Withtheexceptionofsomespecificintegrations(e.g.ArchivematicaandArchiveSpace),thesetoolswerenotdesignedtointeroperatewitheachother,andsotherearenaturallyanumberofchallengesorrisksintryingtodothatasthetoolsstandtoday.
4.1. Current state of data transmission
● Datatransmissionis,ingeneral,consideredoutofscopebythesetools.● Thereisarisktothechainofcustodyinherentinanyattempttochaintoolstogether.The
primaryriskistometadatathatispartofthedigitalobjectitself(e.g.createdon,createdby,modifiedon,modifiedbyetc.)whichcaneasilybechangedorlostaspartof‘moving’datafromonefilesystemtoanother.
● Manyofthesetoolsattempttominimizethisriskinternally,e.g.,Archivematica,Bitcurator,DArcMail,EAS,allbundleseveraltoolsinternallyandmanagedatatransmissionbetweenprocessingsteps.
4.2. Use of standard formats
● Emailcontentformostsystemsisbasedonwell-establishedformats,particularlyMBOXandEML.SofarallsystemscaninputMBOX.
○ EASoutputsonlyEMLandnotalltoolssupportthisasaninput.● Somesystemssupportonlyverylimitedemail-specificprocessing(e.g.Archivematica)andsome
donotatall(ArchiveSpace)-butasthesesystemsaredesignedtotakeinvirtuallyanydigitalobjectsthisisnotabarrierfortheirmoregenericprocessingcapabilities
● Identificationorreferencingmetadataisoftenexpectedina‘format’thatisnonstandardinseveralcases.MessageIDs,repositoryID,collectionIDareoftentiedtospecificexternalsystems(EASwithDRS,DArcMailwithCMS).
● PREMISisthestandardusedtocaptureprovenanceorprocessinghistorymetadataandrightsmetadata(forthosesystemsthatrecordthismetadata).
Page 18 of 22
![Page 22: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/22.jpg)
● TheLibraryofCongressBagItstandardisafilepackagingformatusedbyatatleasttwoofthetools(ArchivematicaandBitCurator).
4.3. Scope of email data or metadata exchange
● Therearenosignificantbarrierstoexchanginganyparticularscopeofemailcontent,withtheexceptionthatsomesystems(e.g.ePADD)assumethatemailisdealtwithormanagedonanaccountbasis,whereanaccountistheemailassociatedwithonlyoneindividual.Inotherwords,theusercouldnotinputallemailsforanentireorganisationandprocessthemtogetheratonce(whilemaintainingallindividualaccountlevelmetadata).
● Severaltoolshavelimitationsonthescopeofmetadatathatcanbeinputoraccepted:
○ EAS,ePadd,DArcMaildonotacceptanymetadataasaninput
● Severaltoolshavelimitationsonthescopeofmetadatathatcanbeoutput:
○ ePADDdoesnotallowformanytypesofmetadatatobeoutput
4.4. Capabilities for recording provenance and/or processing history
● Ifmaintainingafullprocessinghistoryisnecessary,thenitmaynotbefeasibletousesystemsthatdon’tsupportthis(ePADD,DArcMail).
4.5. Capabilities for maintaining identity and integrity of data
Use of unique identifiers:
● Mosttoolsgenerateuniqueidentifiersfordataatvariouslevelsofgranularity(someforindividualemail,virtuallyallforaggregationsofsometypesuchasfolder,account,collectionetc.).
● Mosttoolsdonotacceptorstore‘external’identifiers(i.e.uniqueIDscreatedbyothersystems).Thismaypresentchallengeswhenusingmultipletoolsbecausetherearelimitedwaysofensuringthataparticulardataitemorgroupofdataiscorrectlyidentified(forinstance,iflookingataparticularemailinonetool,isthereawayofconfidentlyfindingandprocessingthesameexactemailinanothertool).
● Sometoolsdoprovidesomemeansofcapturingexternalidentifiers(e.g.inArchivematicabyprovidingIDswithinametadatacsvfileatthepointoftransfer).Howevernoneofthetoolsappeartosupportthisatthelevelofindividualemails.
Definition of key elements and aggregations:
● Manyofthetoolsallowuserstodefinetheelementsoraggregationsthatsuitthembest.Thisflexibilityisastrengthbutcouldleadtosomeconfusionifelementsoraggregationsarenotdefinedconsistentlybetweensystems.
Page 19 of 22
![Page 23: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/23.jpg)
● ThedefinitionofanEmailAccountisprobablythemostsignificantconcernasitappearstobedefineddifferentlyindifferentsystems.Anemailaccountinonetoolmayappeartobesameemailaccountwhenviewedorprocessedinanothertool,buttheriskisthatitisn’tbecausethedefinitionsarenotconsistent.Thereisalsotheriskthatthedatamodelsarenotcompatible-forinstanceifonesystemonlyallowsoneemailaddressperaccountwhereanotherallowsmultipleaddresses.
4.6. System access and documentation
● Alloftheopensourcesystemshavepubliclyavailabledocumentationorknowledgeresources,howeveraccesstodevelopersorsubjectmatterexpertsmaynotbepubliclyavailable.
● NeitherEASnorDArcMailarecurrentlyavailablebeyondtheirinstitutions.Bothprojectteamsintendtoreleasethemwithopensourcelicenses,butworkisrequiredtodothisandmakethesoftwareavailabletothecommunity.
5. Opportunities to Improve the Interoperability of Email Tools
Severaldraftrecommendationsaresuggestedbelowfordiscussion.Atthisstagenoefforthasbeenmadetoprioritizetheseorsetoutconcretenextsteps.Wehavekeptthescopeofthesetoareasthatwefeeladdresstheinteroperabilityofthespecifictoolsassessedinthisreport.
Wehavenotmadeanyspecificrecommendationsregardingthechallengesoftransmittingdatabetweensystems.Whiletherearesomeclearrisks,asdescribedinthefirstpartofsection4.1(suchaschainofcustodyandfileintegrity),wefeelthata)theseareverybroadandapplytoallformsofpreservationusingmultipletoolsandb)theextentoftheproblemisnotwelldefinedoragreedon;forexample,someinstitutionsmaynotseeanyproblemswithdatatransmissionprotocolsthathappenbeforeformalaccession.Whilewefeelthisareawarrantsfurtherconsideration,thatmaybeoutsidethescopeofconcernforthisreport.
5.1. Enhance tools to support external reference identifiers
Attheveryleast,toolsneedtobeabletoacceptandmaintainexternalidentifierssothatemailstewardscankeeptrack(atmultiplelevelsofgranularity)whatdataisbeingprocessedthroughoutaworkflow.
Ingeneral,emailstewardsshouldbeabletousetheidentifiersforindividualitems,foldersorothergroupingsfromonesystemwhenexportingdataandcarryingoutfurtherprocessinginanothersystem.
Ideallyexternalidentifierswouldalsobecapturedwhencapturingprocessinghistorysothatitispossibletoclearlytrackthechainofcustody(forexamplebyassociatingtheidentifierwiththePREMISagentinvolvedinprocessing).
Page 20 of 22
![Page 24: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/24.jpg)
5.2. Adopt standard approaches to capturing and respecting rights and sensitivity metadata
Giventhatemailcollectionsoftencontaincontentwithavarietyofdifferentrights,andthatthereisawidespectrumofprivacyandconfidentialityissuesthatcanbeinvolved,emailarchivingtoolsshouldsupportstandardwaysofcapturingrightsorsensitivitymetadata.
Manysystemsalreadyusestandardsforrights(forinstanceusingPREMISrightsentities);however,theredoesn’tappeartobeanequivalentapproachforrecordingsensitivityorprivacyinformation.
5.3. Establish MBOX as minimum standard for input and output of email content
MBOXisthemostwidelyusedstandardamongstthetoolsconsideredhere.EMLisalsoawidelyusedstandardandsupportedbyamajorityofemailclients.TheEAXSstandardusedinDArcMailmaybemorecomprehensivebuthassofarnotbeenwidelyadoptedandtherearenotoolsfordiscoveryandaccessinthatformat.
WethereforerecommendthattoolprovidersconsideraddingMBOX--complyingwithRFC4155(ApplicationMBOXMediaType)andRFC5322(InternetMessageFormat)--asastandardforbothinputandoutput(wherethatdoesn’talreadyexist).Thisdoesn’tnecessarilymeanobsoletinguseofEMLorEAXS,butsimplyprovidingadditionalsupporttoenablemaximuminteroperabilitybetweentools.
5.4. Establish a common exchange standard for packaging email with metadata
Astandardforpackagingdigitalcontent,describingthecontentsofthepackageandensuringintegrityofthepackageusinghasheswillgreatlyimprovetheabilitytotransferdatasafelybetweensystems.TheLibraryofCongressBagitstandardiswell-establishedandisalreadyusedbyatleasttwoofthetoolshere(ArchivematicaandBitCurator).
TheBagItstandardmaynotbeenoughinitselfhowever.Whilerecommendation5.3wouldensurethatemailcontentcanbetransferredusingtheMBOXstandard,additionalstructuralandmetadatastandardsmaybeneededtodefineminimumexpectationsforwhatcontentormetadataisrequired,optionaloracceptable.Forexample,toclarifywhetheritisacceptabletopackagemultipleemailaccountstogether.
5.5. Support capture of processing history
SeveraltoolsrecordprocessinghistoryusingthewellestablishedPREMISstandard.
Ideallyalltoolswouldprovidethiscapabilitysothatcomprehensiveprocessinghistorycanbecapturedthroughoutaworkflowusingmultipletools.
Page 21 of 22
![Page 25: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/25.jpg)
Further consideration should be given to the consolidation of processing history files from differentsystems, or the ability tomanually addprocessing history (to fill any gapswhere a tool does not yetrecorditautomatically).
5.6. Establish standard definition and description of email collections
Itisn’tclearthatthedefinitionofwhatconstitutesanemailaccount(includingtherelationshipwithemailaddresses,orpeople)isconsistentbetweentools.Establishingacommondefinitionwillenablealignmentofdifferentdatamodelsusedandreducetheriskofconfusionormis-identificationofemailcollectionsatthisfundamentallevel.
Withaconsistentandstandarddefinition,itwillthenbepossibletodevelopacommonstandardfordescribingemailaccounts.Thiswouldhelpimprovetheprecisionofsearchanddiscoveryandbetterenabletheexchangeofdescriptivemetadatabetweentools.
5.7. Make local tools publicly available with an open source license
Toolsthatareonlyusablebyoneinstitutionarenotusefultothewideremailarchivingcommunity.Whilethereareclearlycoststomakingatoolmorewidelyavailableandtryingtocreateandmaintainanactivecommunityaroundit,wefeeltherearemanybenefitsthatcanoffsetthosecostsinthelongrun,includingopeninguptheprojecttoawiderbaseofdevelopers,testersandpotentialfunders.
Acknowledgements
ThisprojectbuiltonthegreatworkstartedattheHarvardEmailArchivingStewardshipTools(EAST)workshopinMarch2016.Wewouldliketothanktheoriginalparticipantsandacknowledgethemanycontributionsreceivedsince.
InparticularwewouldliketothankthecontributorstotheEmailDataSharingFramework;GlynnEdwards,JoshSchneiderandPeterChan(StanfordUniversity),AndreaGoethals,GrainneReillyandSkipKendall(HarvardUniversity),SarahRomkeyandJustinSimpson(ArtefactualSystemsInc.)andCalLee(UniversityofNorthCarolinaChapelHill).
Numerousreviewersprovidedhelpfulcontributionsandsuggestionsforthisreport.WewouldliketothankEvelynMcLellan,JustinSimpsonandSarahRomkey(ArtefactualSystemsInc.),AnthonyMoulen,AndreaGoethalsandGrainneReilly(HarvardUniversity),ChrisProm(UniversityofIllinoisatUrbana-Champaign),CalLee(UniversityofNorthCarolinaChapelHill)andRiccardoFerrante(SmithsonianInstitutionArchives).
WewouldliketothankHarvardLibraryfortheopportunitytoengageinthisworkandprovidingsupportanddirectionthroughout.
FinallythankstoWendyGogel(HarvardUniversity)forcontributionsonmanyfrontsandprovidingleadershipfortheproject.
Page 22 of 22
![Page 26: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/26.jpg)
![Page 27: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/27.jpg)
![Page 28: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/28.jpg)
![Page 29: Email Archiving Systems Interoperability The Harvard ... · PDF fileThe Harvard Library Report Email Archiving Stewardship Tools Workshop is licensed under a Creative Commons Attribution](https://reader031.vdocuments.net/reader031/viewer/2022030417/5aa36b567f8b9a07758e4755/html5/thumbnails/29.jpg)