collecting covid-19 email ephemera

12
Collecting COVID-19 email ephemera FLORA FELTHAM & RHONDA GRANTHAM NATIONAL LIBRARY OF NEW ZEALAND TE PUNA MĀTAURANGA O AOTEAROA

Upload: others

Post on 14-Jan-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Collecting COVID-19 email ephemera FLORA FELTHAM & RHONDA GRANTHAM N AT I O N A L L I B R A R Y O F N E W Z E A L A N D T E P U N A MTAURANGA O A O T E A R O A
Background Project developed as part of the National Library of New Zealand’s COVID- 19 rapid response collecting, which also included memes, social media, websites, and online publications.
Collecting phase was March 2020-Feb 2021 to cover: Nationwide Alert Level 4 (March-May 2020) Nationwide staggered return Level 3 to 1 (May-June 2020) Auckland region Level 3, Nationwide Level 2 (Aug-October 2020) Auckland region Level 3, Nationwide Level 2 (Feb 2021)
Collection to be part of the Ephemera Collection, alongside memes, social media.
Different collecting streams for COVID-19 Email Ephemera Collection Digital Archivist: staff donations from personal email accounts Legal Deposit Librarian: emails sent to National Library’s Legal Deposit email
address
Requirements Immediate: collect what we can, with the tools we have files we have collected are .mbox, .msg, .pst
Short term preservation needs: access copies available to users security and privacy risks mitigated
Long term preservation needs: in perpetuity we don’t know how people will use these collections in the future minimal intervention
Personally Identifiable Information Where is it? email header body of email outgoing links (e.g. Manage Your
Preferences or Click here to unsubscribe)
What can we do about it? easy to remove from header challenges for body of email still at proof of concept
Risks?
Email sent to legal deposit •Phase one collecting (rapid response acquisition of ephemeral materials)
•The Legal Deposit Team receive a variety of email newsletters and notifications from publishers, government agencies, groups and businesses
•We were going to collect these from two mail boxes
•Some of these fall outside the scope of legal deposit collecting
•Recognised the heritage value of notifications with COVID-19 content
•We assessed and filtered emails into specific email folders from February 2020 to February 2021
Collecting method •We were going to collect material from two email boxes
•We use shared email boxes in a corporate environment
•High volume of email, fulfilling different functions
•We needed a simple, consistent, flexible process
•Created COVID-19 folders in Outlook
Processing emails •Moved to a new folder
•Removed email categorisations
•File properties
•Wanted to capture email data and time
•Used https://pypi.org/project/extract- msg/0.22.0/ to extract it
•Appended date and time to email file name
•The script moved the emails into month buckets
•Some emails had very long file titles
Renamed files in a month bucket Script handled images and special characters in filenames
1420 emails were collected by the LD Team Sent from February 2020 to February 2021
3
343
490
380
64
8
86
11 3 1 1 5 25
FEB-20 MAR-20 APR-20 MAY-20 JUN-20 JUL-20 AUG-20 SEP-20 OCT-20 NOV-20 DEC-20 JAN-21 FEB-21
NUMBER OF EMAILS FOR EACH MONTH
Chart2
Number of Emails for each month
43862 43891 43922 43952 43983 44013 44044 44075 44105 44136 44166 44197 44228 3 343 490 380 64 8 86 11 3 1 1 5 25
Chart1
Number of Emails for each month
43862 43891 43922 43952 43983 44013 44044 44075 44105 44136 44166 44197 3 343 490 380 64 8 86 11 3 1 1 5
Sheet1
Feb-20
Mar-20
Apr-20
May-20
Jun-20
Jul-20
Aug-20
Sep-20
Oct-20
Nov-20
Dec-20
Jan-21
Feb-21
3
343
490
380
64
8
86
11
3
1
1
5
25
• Access copies
Collecting method
Processing emails
Renamed files in a month bucket
1420 emails were collected by the LD Team
Next steps / more questions