proposal for electronic archiving system (eas) as free

30
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016 Proposal for Electronic Archiving System (EAS) as Free Open Source Software 1 EAS at Harvard EAS is a system that enables ingest, management and basic preservation of email and also paves the way for access to email. It provides features to identify policy and curatorial issues e.g. rights management, events tracking etc. EAS does not address the capture of email nor does it address discovery or email delivery for end users. It focuses on the curation of email in preparation for long term preservation. The project was developed in conjunction with 3 core partners at Harvard University (Schlesinger Library, HU Archives and Countway Library) with 2 additional participants (Harvard Art Museums and GSD Loeb Library). EAS was built to fulfill the needs of the Harvard University partners and is integrated with several other Harvard University systems – AMS, Policy, Wordshack and DRS.

Upload: others

Post on 07-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

1

EAS at Harvard EASisasystemthatenablesingest,managementandbasicpreservationofemailandalsopavesthewayforaccesstoemail.Itprovidesfeaturestoidentifypolicyandcuratorialissuese.g.rightsmanagement,eventstrackingetc.EASdoesnotaddressthecaptureofemailnordoesitaddressdiscoveryoremaildeliveryforendusers.Itfocusesonthecurationofemailinpreparationforlongtermpreservation.Theprojectwasdevelopedinconjunctionwith3corepartnersatHarvardUniversity(SchlesingerLibrary,HUArchivesandCountwayLibrary)with2additionalparticipants(HarvardArtMuseumsandGSDLoebLibrary).EASwasbuilttofulfilltheneedsoftheHarvardUniversitypartnersandisintegratedwithseveralotherHarvardUniversitysystems–AMS,Policy,WordshackandDRS.

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

2

Archiving Email

The lifecycle 1. Collectiondevelopment

a. Pre-acquisitionappraisal✗2. Accessioning

a. Capture✗b. Normalization.✔

3. ArchivalProcessinga. Item-levelprocessing✔b. Bulkprocessing✔c. Intellectualarrangement✔d. Searchcapability✔e. Personal/Sensitiveinformationprocessing✔

4. Preservationa. Packaging✔b. Repositoryþ

5. OnlineDiscovery✗6. Access✗

✔-EASsupportsþ-EASsupportsviaDRS2✗-EASdoesnotsupportNoteveryinstitutionwillwanttofollowtheentirelifecycle.

The community and the Tools InJune2015therewasanArchivingEmailSymposiumhostedbytheLibraryofCongresswithover150attendees.AttendeesincludedpeoplefromTheSmithsonianInstitute,NARA,EmoryUniversity,StanfordUniversityamongstothers.Therewasinterestintoolstohelpinpreservingemail.Itwasalsoapparentthatthereisnoonetoolthatcoverstheentirelifecycle.Acombinationoftoolsmayhelpinstitutionsintheireffortstoarchiveemailforlongtermpreservation.InfactmanyinstitutionsusedAid4Mailand/orEmailchemytoconvertemailtostandardmboxoremlformatbeforeusingthatoutputasinputtothenexttool.

Open Sourcing EAS ByopensourcingEASitismorelikelythatotherinstitutionswillcollaborateinmakingtheirtoolsinteroperablewithEAS.ThiswouldbeadvantageoustoHarvardUniversitywheresomeEASusershaveexpressedaninterestintheuseofePADDasadonorappraisaltoolwhoseoutputmightthenbeimportedintoEAS.Itwouldalso

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

3

beadvantageoustothecommunitywhoarelackingatoollikeEASwhichcouldbeusedstandalone.

EAS Technology EASiswritteninJava.EASi,theuserinterfacetoEAS,isajavaStruts2webapplicationthatrunsinTomcat.MostofthesoftwareusedinEASisopensourcewiththenotableexceptionoftheuseofinternalLTSsoftwarelibraries,theuseofOracleasthedatabaseandEmailchemyasthesoftwareusedtoconvertemailfromclosed,proprietaryfileformatstoastandardEMLformat.EASdoesnotprovideanAPIforusebyothers.

• Tomcat8• Java8• Struts2• Ant• Gradle• Maven• Oracle12(commercial)• Hibernate5.3.7• Emailchemyembeddedversion(commercial)• Mime4j0.6• Solr4.10.1• Solrj• jQuery1.8.2• jQueryUIThemeRoller• ajax-solr• flexigrid• YUIGrids• JSP• CSS• LTSutilities(proprietary)• FITS0.8• Springbatch4.0.1

SinceEASmaycontainsensitiveinformation,includingHRCI,asecurityarchitecturewascreatedtoprotectthisdata.Thissecurityarchitectureismainlyinfrastructureforexamplethroughtheuseofsecurenetworksandsshmountingoffilesystems.

Commented [RG1]: Securitywhenindocker

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

4

EAS Integration with other Harvard Systems

EASintegrateswithseveralHarvardLibrarysystemsasshowinthediagramabove.

AMS (Access Management System) AMSisanLTSsystemthatprovidesauthenticationandbasicauthorizationservicesforlibrarysystems.AMSinturninteractswithHarvardKeyandLDAP.Itisawebapplicationthatmakesuseofcookiesandbrowserredirects.EASredirectsusers’browserstoAMSandinspectsencryptedcookiesthatAMScreates.EASmakesuseofanAccessclientjarinordertomanagethis.

Policy Policyisusedforauthorizationtolibrarysystems.EASmakesuseofaPolicyclientjarthatisusedtoperformdirectdatabasequeries.

Wordshack Wordshackistheauthoritycontrol/vocabularymanagerforEASandforDRS2.Wordshackmanagesadmincategory,adminflag,emailaddress,person,organization,softwareandtopicterms.ThesetermsareusedthroughoutEAS.InteractionwithWordshackisviaaRESTfulapi,howeverforperformancereasonstermsarestoredlocallyintheEASdatabase.EASmakesuseoftheclientjarfilesprovidedbyWordshackforinteractingwithWordshack.

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

5

DRS (Digital Repository Service) EASusestheDRSversion2(DRS2)asthelongtermpreservationrepositoryforemails.EASwritesDRS2specificbatchestothefilesystemwhenpushingitemstoDRS2forlongtermpreservation.EASalsointeractswithDRS2viaaRESTfulapi,makinguseoftwoclientjarfilesprovidedbyDRS2.ThisRESTfulapiisusedforseveralinteractionswiththeDRS2including:CollectionsarecreatedinDRS2viaEASiAccountsareretrievedfromDRS2foruseinEASBillingcodesareretrievedfromDRS2foruseinEAS

One code base to serve us all EASistocontinuetofirstandforemostservetheHarvardUniversitycommunity.Thisrequiresit’scontinueduseofLTSspecificsystems.TomakeEASopensourceandusefulbyothersoutsideofHarvardUniversityitisnecessarytodisentangleEASfromotherLTSsystemsandfromcommercialorproprietarysoftware.FortheinitialreleaseofEASasOSSweareaimingforaminimumviableproduct–itwillcontaincorefeatureswhichwillpermitittobedeployedandbeusablewithlimitedfunctionality.Itisproposedtomanagethisthroughadependencymanagementbuildtoolandconfigurationmanagement.Therewillbeonecodebasewithoneoftwobuildversionsproduced–theLTSbuildversionandtheOpenSourcebuildversion.ThebuildfilefortheLTSversionwillbeexcludedfromtheopensourcegithubsourcecontrolrepository.InternalLTSjarfilesshouldonlybeusedintheHarvardUniversityversionofthebuiltsystemandexcludedfromtheopensourcedependencies.Theopensourceversionshouldonlyrequireopensourcedependenciesandshouldresultinastandalonebuiltsystem.Alaterphasewilladdressintegration/interoperabilitywithothertools.EASmakesuseofEmailchemyforconversionofemailstothestandardEMLformat.Thisisacommercialproduct.ItwouldbebeneficialtorefactorEAStopermitit’susewithoutthiscommercialproduct.ThiswouldfacilitatepackagingEASinaDockercontainersinceitwouldbeabreachoflicensetoincludeEmailchemyinapubliclyavailableDockercontainer.EASalsomakesuseofthecommercialOracledatabase.EASdoesnotmakeuseofOraclespecificfeaturesandcouldbeconfiguredtoalsoworkwiththeopensourcePostgreSQLdatabase.ThiswouldlowerthebarrierforadoptionofEASandalsopermitpackagingaprepopulateddemoofEASinaDockercontainer.

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

6

EAS initial refactoring with Anti-corruption layers between Bounded Contexts

AnAnti-corruptionlayerisaconceptfromDomainDrivenDesign.InthecaseofEASthereareseveralboundedcontexts(authentication,accesscontrol,controlledvocabulary,collectionsmanagementetc.)thatcouldbenefitfromthislayer,permittingfutureimplementationstobepluggedinmoreeasily.OnewayoforganizingthedesignoftheAnti-corruptionlayerisasacombinationofFacades,Adaptorsandtranslators,alongwiththecommunicationandtransportmechanismsusuallyneededtotalkbetweensystems.UsingdependencyresolutionandconfigurationmanagementeithertheLTSspecificorthedefaultOSSspecificimplementationswillbeavailable.Configurationwillbeusedtomanagefeaturetogglesandfeaturegates:

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

7

Example code for use of feature toggle and feature gate:

If (gateClient.isUngated(“my.feature.name”)){ doFeatureCode(); }

Future Possibilities

Integration with ePADD ePaddconsistsof4modules,inorderofworkflowusagetheyare:appraisal,processing,discoveryanddelivery.Mboxfilesarefedintotheappraisalmoduleandtoproceedtothenextmoduleitisnecessarytoexporttoanarchive,aninternalePADDnon-standardartifact.Conceptuallyanarchiveisacollectionofindexedmessagesalongwithablobstore.Thisarchivethenneedstobeimportedtothenextmoduleandtheprocessrepeatsforeachmodule.Thedeliverymoduledoesprovidetheabilitytoexportemailstomboxformat,butitmaynotbelossless.TheApril2016releaseofePADDisplannedtopermittheexportofemailstomboxformatfromtheappraisalmodule–againitmaynotbelossless.TheintentoftheePADDappraisalmoduleisforusestandaloneonadonor’sworkstation.AtHarvardUniversity,itisdesirablefordonorstobeabletousetheappraisalmoduleofePADDandforcuratorstousetheresultofthatprocessinginEAS.Somepossibleapproachesforthatachievingthatfollowbelow.WiththeApril2016releasethedonorwillbeabletoexporttheresultoftheirprocessingtomboxformatwhichcouldthenbeimportedintoEAS.EAScouldsplitthesemboxfilesintoemlfilesitself,withouttheuseofEmailchemy(itiseasytoidentifythestartofeachnewmessagebythepresenceofthe“From_line”).Thiswouldneedsomemechanismforcontrollingthis.AlternativelythemboxcouldberunthroughsomesoftwaretoproduceemlfileswhichcouldthenbeimportedintoEAS.LTSwouldrequirethatthisapproachberecorded–viaeventsandclientagent.Asanalternative,ePADDcouldprovideaclientjarfileforextractingemailsfromanarchiveintomboxorevenemlformat.EAScouldusethistoprocessanePADDarchive.Thedisadvantageofthisapproachisthatitwouldonlyworkwithjavaapplications.TheePADDarchivecontainsserializedobjectswhichcanonlybereliablyreconstitutedbyusingthejavalanguagetodoso–thislimitshowportablethesearchivesare.

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

8

Potential Roles LTSprojectmanagerformanagingossinfrastructure,frameworkandformovingeastooss..HLprojectmanagerforliaisingwithHLcommunity,externalcommunityandHLleadership.LTSdevelopers

EASDRS Discovery Access/DeliveryWordshack

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

9

Open Source Software Checklist BasicFactors Explanation Remarks R A C IUsefulness Softwareshould

beusefulmoreorless“asis”.

OSSversionshouldnotincludeLTSspecificjars.OSSversionshouldnotrequirecommercialproducts.

LTS ? ? ?

Interoperability Ifthesoftwareinteroperateswithothersoftwaretools,theopensourceprojectshouldhavewelldocumented,preferablystandardsbased,interfacestoexternalcode-webservices,classinterfaces,orotherpoints.

EASneedstoberefactoredtoprovideabstraction/anti-corruptionlayerswherealternateimplementationsmaybepluggedin.

LTS ? ? ?

License Thesoftwareshouldbereleasedwithalicensestatemente.g.Apache2,GPL,LGPL,MIT,BSD,AGPLv3.

Choiceislimitedbydependencyonsoftwarewithrestrictivelicenses.Ifagivendependencyisoptionalhowdoesthataffectthelicenserequirement?

LTS ? OCG&HUITCTO

HL

ContributorLicenseAgreement

Manyopensourceprojectsrequirethis.

LTS ? OGC&HUITCTO

HL

Copyright AttopofeachClass

LTS LTS OGC&Provost

HL

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

10

Office&HUITCTO

Patent Somesoftwareincludesapatentinadditiontothesoftware.

Anexampleisthefacebookreactjslibrary.InvestigateHUpoliciesgoverningthis.

LTS LTS OGC&ProvostOffice&HUITCTO

HL

UserDocumentation

Forusebyusers ? ? ? ?

DeveloperDocumentation

Forusebydevelopers

LTS ? ? ?

CodeDocumentation

Classlevelataminimum

LTS ? ? ?

Sourcecontrol Github LTS ? ? ?IssueTracking Github LTSusesjira.

HowdowesynchronizeGithubissueswithinternalLTSjiraissues?

LTS ? ? ?

Deploymentpackaging

Shouldweprovideareadytorunimplementation?Thiswouldenableeasieradoptionbyothers.

ePADDprovidesapackagedversionreadyfordeploymentonWindowsorMac.EASusessomelinuxspecificfunctionality.WecouldprovideaDockerizedversionconfiguredforquicksetup.

LTS ? ? ?

Demo Shouldweprovideaself-containeddemo?

Providealotofexamplesandconcentrateonhavingsomereallyshinyonestoimpressusers/developersenoughtotakea

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

11

closerlook.Contributions Whoshould

decidewhatcontributionstoaccept?

Contributionsincludeideas.

LTS? HL? ? ?

Committers InitiallyLTSonly LTS ? ? ?Tests Docontributors

needtoprovidetestsforcontributedcode:Unittests,integrationtests,functionaltests?

Initiallywillonlyacceptideasandnotcode.

LTS ? ? ?

Documentation Whatlevelofdocumentationwouldwerequireforcontributedcode?

Initiallywillnotbeacceptingcode.

LTS ? ? ?

Support Needaforumfordiscussingfeatures,technicalissuesetc.Whatforum?

RequiresanEmaillist/Googlegroupetc

LTS ? ? ?

Outreachandcommunications

Whatforumsdowewanttoposton?Whateventsdowewanttopresentat?

HL ? ? ?

R:Responsible–whoisassignedtodotheworkA:Accountable–whomakesthefinaldecisionandhasultimateownershipC:Consulted–whomustbeconsultedbeforeadecisionoractionistakenI:Informed–whomustbeinformedthatadecisionhasbeentakenHL:HarvardLibraryLTS:LibraryTechnologyServicesOGC:OfficeoftheGeneralCounsel

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

12

Proposed work for Open Sourcing EAS

Miscellaneous : FieldsthataremandatoryinEASduetoitsintegrationwithDRS2willremainmandatoryandwillbepopulatedwithdefaultvaluesintheOSSversion.Future–maymakethesemandatoryfieldsconfigurableinfuture.1 CreateanewbuildprocesswithdependencyresolutionDescription ThecurrentbuildprocessforEASusesANTwithnodependency

resolution.AlternativesareIvy,MavenorGradle.FirstchoiceisGradle,secondchoiceisIvy.Mavenisstubbornly“opinionated”andwouldnotaccommodatemanyofourexistingLTSprojects.

Update Movetomavenanddockerandpossiblyansible–ongoingNeedtoupdateversionofdocker/dockercomposeTheLTSchangecontrolprocesshaschangedandisinfluxduetotheintroductionofdockerandansible.Implementationdetail-JavaserviceLoadermaybeusedtofacilitateswitchingimplementationsofservices.Mavencanthenbeusedtopullinthecorrectimplementationjartothebuild.

Comments ThisenablesacustomizedbuildforLTSversustheOSSversion.ThismustworkwiththeLTSchangecontrolprocess.LTSproprietaryjarsshouldbeexcludedfromtheOSSbuilddependencies.Jarsfromdependentprojects(e.g.hibernate)shouldbepulledinusingdependencymanagementduringthebuild.Question:Fitsincludesots.jarwhichisaproprietaryLTSjar.Whatistheimplicationofthis?EAScurrentlyuses93jarfilesinadditiontothoseusedbyFitsandSolr.SeeGrouperBuild/DependencyManagementforsomereasoningonthechoiceofabuildtool.Thisisanabsolutepre-requisiteforthisproject.Itisnot

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

13

possibletoexcludejarfileswithoutthis.Buildsystemneedstobesetupinordertocontinuedevelopment.

Future Dependencies LTSArtifactoryinstanceshouldbepopulatedwithrequiredjarsFeedback RS–thisisatechnicaldebtprojectandthereforedoesnot

belonginthisprojectbutratherina“technicaldebt”projectProposedphase Phase12 Abstract out authenticationDescription Abstractoutauthenticationsothatitcanbeconfiguredto

1. UseAMSforauthentication2. UseauthenticationinformationfromanXMLfile3. Facilitateplugginginofnewauthenticationmechanismin

futureUpdate SwitchfromAMStoCAS.

DoincludeXML/Json–usejsonschemaforvalidationofdata.Comments Authenticationiscurrentlycloselytiedtotheuser’sHUID,which

isusedthroughoutthesystem.Theuser’semailaddresswhichAMSreturnsviaanLDAPlookupisalsoused.Forsecurityreasons,theLTSversionmustonlyworkwithAMSandavalidHUID.TheOSSversionshouldnotbeconfigurabletouseAMSandshouldnotincludetheaccess.jarfile.Itshouldfailgracefullyifmisconfigured.

Future InternaldatabaseOAuthShibbolethCASLDAPActiveDirectoryOpenConnect

Dependencies (1)Feedback AM–implementLDAPforphase3

GR–dependingonfeedbackfromcommunitydecideonwhichimplementationtouseforphase3

ProposedPhase Phase13 AbstractoutauthorizationDescription Abstractoutauthorizationsothatitcanbeconfiguredto

1. UsePolicyforauthorization

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

14

2. UseauthorizationinformationfromanXMLfile3. Facilitateplugginginofnewauthorizationmechanismin

futureUpdate TalkwithIAMaboutmakingdirectAPIcallstogrouper

NeedtosetupgroupergroupsforEAS–runthembyLTSSupportIfIAMwon’tpermitdirectAPIcalls–staywithusingpolicyStillneedtousePolicyforDRSdepositorlookupForOSSUseauthorizationinformationfromanXML/Jsonfile(usejsonschemaforjsonvalidation).

Comments CurrentlytheuserHUIDisusedtolookuppolicyinformation.Forsecurityreasons,theLTSversionmustonlyworkwithPolicyandavalidHUIDTheOSSversionshouldnotbeconfigurabletousePolicy.Itshouldfailgracefullyifmisconfigured.

Future InternalDatabaseGrouperLDAP

Dependencies (1)(2)Feedback AM–implementLDAPforphase3

GR–dependingonfeedbackfromcommunitydecideonwhichimplementationtouseforphase3

Proposedphase Phase14 EnableconfigurationtousePostgreSQLinsteadofOracleDescription EAScurrentlyisconfiguredtouseOracle.Itmakesnouseof

OraclespecificfeaturesandcouldworkwithPostgreSQLviaminorconfigurationchangessinceEASusesHibernateORM.

Update SwitchallversionstousePostgeSQL(RDS)WillinvolveworkfromSharon’sgroupWillalsoinvolveinputfromAM/SharontoensureitremainsattherightsecuritylevelforHRCI.

Comments UseofPostgreSQLremovesadependencyonacommercialdatabase.Thiseliminatesconstraintsconcerninglicenserestrictions.UseofPosgreSQL:

• LowersthebarriertoadoptEAS(nolicensetopay)• Permitsthecreationofaselfcontained,pre-populated

EASDemoinaDockerContainer(ItisabreachoftheOraclelicensetodeploythedatabaseinaDockerContainer)

TheLTSversionofEASshouldcontinuetouseOraclefor

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

15

performanceandoperationalreasons.TheOSSversionofEASshouldbeconfigurabletouseeitherOracleorPostreSQL.

Future Dependencies (1)Feedback Proposedphase Phase15 AbstractoutAccounts(DRSownercodes)andBillingCodesDescription Ownercodes(Accounts)arestoredinDRS2andalocalcopyis

createdintheEASdatabasewhenenabledforuseinEAS.BillingcodesarestoredinDRS2andretrievedforuseinEAS.AbstractoutAccountsandBillingcodessothatEAScanbeconfiguredtouse:

• AccountsandBillingCodesfromDRS2• AccountsandBillingCodesfromanXMLfile• Facilitateplugginginofothermeansofretrieving

AccountsandBillingCodesinfuture

Update OssuseXMLorJSonconfiguration–usejsonschemaforvalidation.

Comments TheLTSversionshouldonlyworkwithAccountsandBillingcodesfromDRS2.OSSversionshouldnotbeconfigurabletouseAccountsandBillingCodesfromDRS2.

Future AM-Makeitconfigurabletomakeaccountsandbillingcodesoptional.

Dependencies (1)(2)(3)Feedback RS-shouldworkoutimpactontimeitmighttaketoimplement

bullet3above.GR–ifaremakingitconfigurabletoreadthisinformationfromanxmlfile,thenweneedtocreateanabstractionlayeranyhow.GR–regardingthefutureoptionofmakingthisinformationoptional,thisinformationismandatoryinthedatabaseandsolrindexbecauseitisveryimportantforLTS.Usingdummyvaluesfromadefaultxmlfilewillnotinconvenienceuserswhodonotneedthisinformation.Thesystemhasnotbeenarchitectedforconfiguringoptionalityofdatabasetables/fieldsanddoingsowouldrequiresignificantwork.

Proposedphase Phase1

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

16

6 AbstractoutDRSCollectionsDescription CollectionsarecreatedinDRS2viatheEASuserinterface.

MinimalcollectioninformationisstoredintheEASdatabaseandtheEASSolrindex.NeedtheabilitytoconfigureEAStocreateCollections:

• InDRS2withminimalinformationinEAS• MinimalinformationonlyinEAS

Comments TheLTSversionshouldonlyworkwithCollectionsinDRS2.

TheOSSversionshouldnotbeconfigurabletocreateCollectionsinDRS2

Future Aseparateprojectcouldmanagecollections.LibrarycloudhasaseparateprojectformanagingitscollectionswhichcouldbeusedasamodelforafutureEAScollectionsmanagementproject.ThiswouldrequireprovidinganapiinEASforupdatingthecorecollectioninformationintheSolrindexandthelocaldatabase.

Dependencies (1)(2)(3)Feedback WG–needtobeabletoassociateitemswithcollections.

Agreed-MakeEASconfigurabletoonlyrequiretitleforcollectionandnotcollectanyotherinformationonCollections.Theninfutureabstractoutcreationofcollectionsinothersystems.Reducedestimatebaseduponaboveagreement.

Proposedphase Phase17 AbstractoutWordshackTermsDescription EnableconfigurationofEAStocreateandusecontrolled

vocabularyterms• InWordshack• InanXMLfile• Facilitateplugginginofothermeansofmanaginga

controlledvocabulary

Update UseupdatableXML/JSonfile/store(sinceemailaddressesarecreatedduringimport)ORInOSSversioncouldjustcreatetermsdirectlyindatabase?Question–UIforcreatingtermsdirectlyindatabase

Comments Wordshacktermsareintricatelytiedintothesystem–• ontheserver• theuserinterface(itusesaWordshackwidget)in

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

17

conjunctionwithaproxyservletfilter• inthedatabase• intheSolrindex

TheXML/JSonfileshouldbekeptsimplefortheinitialrelease.TheLTSversionshouldonlyworkwithWordshackterms.Thephase1OSSversionshouldnotbeconfigurabletoworkwithWordshackandshouldnotincludethewordshackclientjar.ThesupportedemailclientsarerecordedinWordshackassoftwareterms.

Future PossiblyexpandtosupportothercontrolledvocabulariesDependencies (1)(2)(3)Feedback RS-IfWordshackwereavailableasopensourceitwouldmean

thatitcouldbeusedandsowemightnotneedtodothiswork.GR–wedonotwanttoforceOSSuserstouseourimplementationofacontrolledvocabulary.AlsowedonotwanttobuildinadependencyonanotherprojectbeingopensourcedinordertoopensourceEAS.

Proposedphase Phase18 RemoveFitsfromOSSVersionDescription EnableconfigurationofEAStoremoveFITSComments FitsisusedduringimportandpushtoDRS2.

Duringingestitisonlyrequiredinordertogetfileformatinformation.FortheOSSversionitcangetthefileformatinformationbyissuingthe“file”commandunderlinux(EASalreadyneedstorununderlinuxsothisintroducesmorenon-portablecode).ThisshouldstillbeconfigurableandfailgracefullyifconfiguredtouseFITSintheOSSversion.

Future See9

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

18

Dependencies (1)Feedback Proposedphase Phase19 ReplaceEASFitsServletwithOSSFitsServletDescription WhenEASwasimplementedFitswasincludedintheprojectand

implementedasawebapplication(similartoSolr).SincethenanopensourceversionoftheFitsServlethasbeendevelopedandisalmostreadyforuse.OncetheopensourceversionoftheFitsServlethasbeenreleasedthisshouldbeusedbyEASinsteadofit’sownimplementation.

Update Thishasalreadybeenimplemented–usingAM’sFITSdockerimage.However,theFITSDockerimageshouldbemadeavailableondockerhuh.

Comments UseoftheOSSversionoftheFitsServletwillmakeiteasiertokeepuptodatewiththelatestversionofFits.Thismaybemanagedbydependencyresolution.Thefits.jarfilewillstillberequiredbytheEASwebapplicationinordertoprocesstheoutputfromfitsduringimport(usedinordertopopulatethefileformatinformationforattachments).

Future Dependencies OSSversionofFitsServletmusthavebeenreleasedFeedback NeedtoalignlicensesProposedphase Phase2

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

19

10 ConfigureDisablingofPushtoDRSDescription ProvidetheabilitytoconfigureEASto

• PushtoDRS• DisablePushtoDRS

Update ForOSSversion-disablepushtoDRSbutenablepackage

creation.OSSversionmuststillcreatepackageCouldcreatebatchidenticaltoDRSbatchandOSSuserscouldmanipulateitthemselvestoproducewhattheywant.ChangedescriptorstobelessDRSspecificinOSSversion.LTSversionusesOTSwhichcontainsalotofLTSspecificconstants,LTSspecificvalidationetc.TODO–TriciaandSteveneedtoestablishwhatisacceptableinthedescriptors.Issueswithdescriptors:

• ContainWordshackURIs• ContainURNs• ContaindrsAdmindata(schema)• ContainhulEventExtension(schema)

ForOSSperhapssimpledescriptorsshouldbecreatedusingjaxbandnotusingOTS.

Comments ThroughuseoftheRESTfulapiinitem21itwillprovidetheabilityforotherprojectstopulltheinformationrequiredinordertocreateapackageforpreservation.Item22wouldprovidetheabilitytoactuallycreateabagforarchivingmakinguseoftheapiprovidedbyitem21.Item11providesforexportemailsandattachmentsbutnotmetadataasanmbox.

Future Dependencies (1)Feedback RS–needabilitytocreateaverysimplebag.

WG–needoutputsodoneedtoincludeabilitytocreateapreservationpackage.ThiscouldbeanappealingdeliverableforanIMLSgrant.Reducedestimatebasedondiscussions.OnfurtherdiscussionwithRSandWGwillnotcreateabaguntilknowwhatwouldbeusefulinthebag.Needtodiscussthiswith

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

20

communityduringtheworkshop.Proposedphase Phase111 AddabilitytoconfigureEAStonotuseEmailchemyDescription ThiswillremoveadependencyonaCommercialtool.

ByremovingthisdependencyitwillbepossibletopackageEASinaDockercontainer.ItalsoreducesthebarriertouseofEASibyremovingthenecessitytopayforsoftware.EASshouldfailearlyandgracefullyifEAShasbeenconfiguredtonotuseEmailchemyandifausersubmitsapackettypewhichcanonlybeprocessedbyEmailchemy.

Comments Mostofworkwillbearoundprojectbuildconfiguration.DonotwanttoresultinamoreonerousdeploymentinLTSsoneedtomakeitasautomatedaspossible.

Future Dependencies (1)Feedback Proposedphase Phase112 AddhandlingforemlfilesDescription Bypermittingthesubmissionofemlfilesinapacketuserswill

havetheoptionofusingwhatevertooltheyliketoconverttheiremailstoemlpriortousingEAS.

Comments Shouldthecreatoragentbeacombinationofemlandthetoolusedtoconverttoeml?Ifsoitshouldberecordedinthecontrolledvocabularyasasoftwareterm.

Future Dependencies (1)Feedback Proposedphase Phase113 AddhandlingformboxfilesbyEASitselfDescription Thiswouldpermithandlingofmboxfileswithoutrequiringthe

useofEmailchemy.Manymailboxescanbesavedfromemailserversetcinmboxformat.Itappearstoberelativelysimpletosplitanmboxfileintoindividualemlfiles–thestartofeachnewmessageisidentifiedbythe“From_line(useregexon/^From /lines).

Comments EASshouldberecordedastheagentinthenormalizationevent.

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

21

Future Dependencies (1)Feedback Proposedphase Phase114 DothoroughreviewoflibrariesusedinEASOSSversionDescription Thisisrequiredinordertoensurethatweareincompliance

withalllicensesoflibrariesusedinEASOSS.Partofthiswillbetolistallthelibrariesusedinthe:

• OSSversion• LTSversion

Thisisalsoarequiredstepinordertosetupdependencyresolutioncorrectly.

Comments Candependencymanagementalsohandlelicenses?Wemayneedtomanuallyincludelicensesetc

Future Dependencies Feedback Proposedphase Phase115 DothoroughcleanupoftestsDescription EAShasnumerousunitandintegrationtestswhicharecurrently

badlyorganized.Theseneedtobecleanedup.Withtherefactoringitmaymakesensetointroducetheuseofmocks.

Comments Future Dependencies Feedback Proposedphase Phase216 MakeUserinterfacechangesDescription Usefeaturerequesttogglestoenable/disableLTSspecific

language.Comments Future Dependencies Feedback Proposedphase Phase1

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

22

17 Reviewforuseofpublic/private/protected/packagelevel

methodsDescription TheaccessmodifiersonclasseswithinEASwerenotcarefully

managed.Leavingpublicmethodswhichshouldinfactbeprivatecanleadtomisuseofthosemethods.

Comments Future Dependencies Feedback Proposedphase Phase218 HandleconfigurationofotherjobsDescription ThereareseveraljobsusedinEAS.Thesewouldneedtobe

configurable(usingfeaturetoggles)fortheLTSversionortheOSSversion.

Comments TheLTSversionshouldpermitrunningofthesejobs:Loader,Importer,DRSprearchiver,DRSpostarchiver,DRSpacketeventsarchiver,accountmonthlystatistics.TheOSSversionshouldnotpermitrunningofDRSprearchiver,DRSpostarchiver,DRSpacketevents.

Future Dependencies Feedback Proposedphase Phase119 RemoveLTSproprietaryjarsDescription Theutil.jarLTSproprietaryjarprovidesfunctionalitythatis

mostlynowavailableincoreJavaorinopensourcelibraries.

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

23

WherepossiblethecodeshouldberefactoredtousetheseimplementationsinordertoremoverelianceonLTSproprietarycode.Theldap.jarLTSproprietaryjarisnotused.Boththesejarfilesshouldberemovedifpossible.

Comments UsersofOSSprojectsneedaccesstothesourcesoanyjarfilesusedintheprojectshouldalsobeopensource.

Future Dependencies (1)Feedback Proposedphase Phase120 ImplementRESTfulapiDescription TomakeEASmoreopenforuseitwouldbebeneficialtocreatea

RESTfulapiComments ThisRESTfulapicouldbeusedbyanotherapplicationtocreatea

bag(see21).TheRESTfulapicouldbeusedbyanotherapplicationtocreateandmanagecollections(see6).ThisapimustbeimplementedsothatitmaybeusedbyexternalclientsviaRESTandbyEASitselfinprocess.

Future Dependencies (1)Feedback Proposedphase Phase321 ImplementLOCBagcreationDescription Implementcreationofabagwhichmakesuseoftheinprocess

apifrom20above.Thisprocessshouldbetriggeredviatheuserinterface

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

24

Comments Usehttps://github.com/LibraryOfCongress/bagit-javatohelpinbagcreation.Question:whatshouldbeinthedescriptorfiles?METSseemstonotbepopular.Needfeedbackfromthecommunityonthis.WhenitemsaresuccessfullyarchivedtoDRStheyaredeletedfromEAS(withoutgeneratinganydeleteevents).Whatshouldhappenwhenabagiscreated?Creationofabagdoesnotmeanthattheitemshavebeensuccessfullyarchived.

Future Dependencies (18)Feedback Proposedphase Phase322 PackagefordeploymentDescription Toreducethebarriertoadoptionitisdesirabletoprovidea

deployableversionofEAS.Comments EASusessome“unixlike”osspecificcommands–andsowillnot

runonwindows(onereasonwasduetoabuginthejavaFileclasswhichdoesnothandlecertainspecialcharactersinthefilename).EAScouldbepackagedformacusingoracleAppBundlerwithhdiutil(ePADDdoesthis).ItmaybebesttoprovideitinaDockercontainer.

Future Dependencies Feedback Proposedphase Phase1

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

25

Proposed Roadmap

Prerequisites/phase1 Item1 Createanewbuildprocesswithdependencyresolution

Phase 1 Item2 AbstractoutauthenticationItem3 AbstractoutauthorizationItem5 AbstractoutAccounts(DRSownercodes)andBillingCodesItem6 AbstractoutDRSCollectionsItem7 AbstractoutWordshackTermsItem8 RemoveFITSfromOSSversionItem10 ConfigureDisablingofPushtoDRSItem11 AddabilitytoconfigureEAStonotuseEmailchemyItem12 AddhandlingforemlfilesItem13 AddhandlingformboxfilesbyEASitselfItem16 MakeUserinterfacechangesItem18 HandleconfigurationofotherjobsItem19 RemoveLTSproprietaryjarsItem4 EnableconfigurationtousePostgreSQLinsteadofOracleItem14 DothoroughreviewoflibrariesusedinEASOSSversionItem22 Packagefordeployment

Phase 2 Item15 DothoroughcleanupoftestsItem17 Reviewforuseofpublic/private/protected/packagelevelmethodsItem9 ReplaceEASFitsServletwithOSSFitsServlet

Phase 3 Item20 ImplementRESTfulapiItem21 ImplementLOCBagcreation

Phase 4 Detailsaretobedecidedbythecommunity.Interoperabilityistobeaslooselycoupledaspossible–e.g.viafileinterchange,restfulapisandthelike.MakeEASinteroperablewithePaddMakeEASinteroperablewithBitcurator(redaction)

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

26

Resources

Wordshack https://wiki.harvard.edu/confluence/display/LibraryTechServices/SysDev+-+WordShack

Access Management System https://wiki.harvard.edu/confluence/display/LibraryTechServices/SysDev+-+Access

Policy Server https://wiki.harvard.edu/confluence/display/LibraryTechServices/SysDev+-+Policy+Server

DRS2 https://wiki.harvard.edu/confluence/display/LibraryTechServices/SysDev+-+DRS2

Emailchemy http://www.weirdkid.com/products/emailchemy/

DArcMail http://www.digitalpreservation.gov/meetings/documents/aes15/1_LC_AES_SIA_EmailandCERP_DarcMail_20150602.pdfhttp://siarchives.si.edu/blog/yes-we%E2%80%99re-still-talking-about-emailhttp://www.history.ncdcr.gov/SHRAB/ar/emailpreservation/mail-account/mail-account_docs.html

Bitcurator http://www.bitcurator.net/

ePADD http://library.stanford.edu/projects/epaddhttps://github.com/ePADD/epaddhttps://github.com/ePADD/muse

Lifecycle Tools for Archival Email Stewardship (in progress) https://docs.google.com/spreadsheets/d/1V1N22xnr5e0EbDlZWx58bjYO6rkrMrYH9wGX9-CK8c4/edit?pli=1#gid=986222267

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

27

Archiving Email Symposium 2015 http://www.digitalpreservation.gov/meetings/archivingemailsymposium.html

Email related RFCs https://tools.ietf.org/html/rfc5322

Email formats http://www.digitalpreservation.gov/formats/fdd/fdd000388.shtml http://www.digitalpreservation.gov/formats/fdd/fdd000383.shtml

fits http://projects.iq.harvard.edu/fits https://github.com/harvard-lts/fits

Open Source https://wiki.harvard.edu/confluence/display/LibraryTechServices/LTS+Open+Source+ProjectsIntroducingtheOpenSourceMaturityModelMakinganOpenSourceProjectBloom

Licenses http://choosealicense.com/licenses/https://en.wikipedia.org/wiki/Comparison_of_free_and_open-source_software_licenseshttp://opensource.org/licenses/

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

28

Jar files used by EAS access.jar(LTSproprietary)activation.jarantlr-2.7.6.jaraopalliance-1.0.jarapache-mime4j-0.6.jaraspectjrt-1.6.8.jaraspectjweaver-1.6.8.jarc3p0-0.9.1.jarcglib-nodep-2.2.jarcom.ibm.jbatch-tck-spi-1.0.jarcommons-cli-1.1.jarcommons-codec-1.6.jarcommons-collections-3.1.jarcommons-configuration-1.5.jarcommons-fileupload-1.2.1.jarcommons-httpclient-3.1.jarcommons-io-2.3.jarcommons-lang-2.4.jarcommons-lang3-3.1.jarcommons-logging-1.1.3.jarcommons-pool2-2.2.jardom4j-1.6.1.jardrs2_services-dto.jar(LTSproprietary)drs2_services-util.jar(LTSproprietary)easi.jarehcache-1.5.0.jarfits.jarfluent-hc-4.3.5.jarfreemarker-2.3.15.jargeronimo-stax-api_1.0_spec-1.0.1.jarguava-15.0.jarhibernate-jpa-2.0-api-1.0.0.Final.jarhibernate-testing.jarhibernate-tools.jarhibernate3.jarhttpclient-4.3.5.jarhttpclient-cache-4.3.5.jarhttpcore-4.3.2.jarhttpmime-4.3.5.jarjavassist-3.9.0.GA.jarjaxen-core.jar

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

29

jaxen-jdom.jarjcl-over-slf4j-1.6.1.jarjdom.jarjettison-1.1.jarjstl.jarjta-1.1.jarldap.jar(LTSproprietary)log4j-1.2.17.jarmail.jarmina-core-1.1.7.jarnoggit-0.5.jarognl-2.7.3.jarojdbc14.jaroscache-2.1.jarots.jar((LTSproprietary)saxpath.jarservlet-api.jarslf4j-api-1.7.7.jarslf4j-log4j12-1.7.7.jarsolr-solrj-4.10.1.jarspring-aop-3.2.3.RELEASE.jarspring-batch-core-2.2.2.RELEASE.jarspring-batch-infrastructure-2.2.2.RELEASE.jarspring-batch-test-2.2.2.RELEASE.jarspring-beans-3.2.3.RELEASE.jarspring-context-3.2.3.RELEASE.jarspring-context-support-3.2.3.RELEASE.jarspring-core-3.2.3.RELEASE.jarspring-expression-3.2.3.RELEASE.jarspring-jdbc-3.2.0.RELEASE.jarspring-orm-3.0.5.RELEASE.jarspring-retry-1.0.2.RELEASE.jarspring-test-3.2.3.RELEASE.jarspring-tx-3.2.3.RELEASE.jarstandard.jarstax2-api-3.0.1.jarstaxmate-2.0.0.jarstruts2-core-2.1.8.1.jarstruts2-json-plugin-2.1.8.1.jarswarmcache-1.0RC2.jarutil.jar(LTSproprietary)velocity-1.4.jarvelocity-tools-generic-1.1.jarwoodstox-core-lgpl-4.0.7.jar

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

30

wordshack-client.jar(LTSproprietary)wstx-asl-3.2.7.jarxercesImpl.jarxml.jarxpp3_min-1.1.4c.jarxstream-1.3.jarxwork-core-2.1.6.jarzookeeper-3.4.6.jar