Presentation to ARGIS - Atlanta Region GIS User GroupOctober 30, 2013
Data Archiving & Preservation: Best Practices for GIS
Jennifer Doty | [email protected] Data Management Specialist
Emory Center for Digital Scholarship
Overview
Best practices for managing geospatial data:• File formats• Naming conventions• Folder structure• Storage and backup• Documentation
Trends in geospatial data archiving:• Federal funding agencies’ requirements• State initiatives for preservation
2
Best Practices: File Formats Type of data Acceptable formats for
sharing, reuse and preservation
Other acceptable formats for data preservation
Geospatial datavector and raster data
ESRI Shapefile (essential - .shp, .shx, .dbf, optional - .prj, .sbx, .sbn)
geo-referenced TIFF (.tif, .tfw)
CAD data (.dwg)
tabular GIS attribute data
ESRI Geodatabase format (.mdb, .gdb)
MapInfo Interchange Format (.mif) for vector data
Keyhole Mark-up Language (KML) (.kml)
Adobe Illustrator (.ai), CAD data (.dxf or .svg)
binary formats of GIS and CAD packages
3UK Data Archive File Formats guide, http://www.data-archive.ac.uk/create-manage/format/formats-table
4
Best Practices: File Formats
GeoMAPP Geospatial Data File Formats Reference Guide:• provides quick reference of common
geospatial raster and vector dataset types• serves as tool to identify geospatial format
types based on file extensions• also includes information on standards and
specifications for documenting geospatial data
http://www.geomapp.net/docs/GeoMAPP_Geospatial_data_file_formats_FINAL_20110701.xls
Best Practices: Naming Conventions
• Create meaningful but brief naming conventions for your project
• Use file names to classify broad types of files • Avoid using spaces and special characters• Begin names with letters, not numbers
e.g. Census2010_blockgroups_GA, not 2010Census…• Avoid very long file names
5
6
Best Practices: Naming Conventions
Example: keyword_steward_extent_date.ext
• Keyword (essential)—be as descriptive of the contents of the data as possible by using a word or short phrase
• Steward (essential)—either the creator of the dataset or the last one to make a significant modification to a dataset
• Extent (optional)—may be included to indicate resolution of the data (e.g. county, state, or international)
• Date (optional)—may be used to indicate the date of creation or the age range of the content. Recommended format is YYYYMMDD
Indiana Geographic Information Council, http://www.igic.org/standards/namingstandard.pdf
7
Best Practices: Naming Conventions
Versioning:• useful to indicate file revisions or edits,
especially in collaborations• can be through discrete or continuous
numbering, depending on minor or major revisions– think of software versioning—ArcGIS 10 was
significant change from 9.x., but ArcGIS 10.1 was (relatively) minor change to 10
Best Practices: Folder Structure
• Separate directories for scratch workspace and final data
• Hierarchy—is deep or shallow best for your project?
8
Best Practices: Storage & Backup
Storage Considerations:• Accessibility • Read/Write speed• Size limits—overall vs. file size
Options:• Local—PC drive, flash drive, external hard drive• Server—department/organization server space• Cloud—Dropbox, Google Drive, etc.
11
Best Practices: Storage & Backup
Backup Considerations:• Accessibility (local, server, cloud)• Redundancy (rule of thumb—here, near, far)
Options:• Incremental/Snapshot• Automated
12
14
Best Practices: Documentation
“When thoughtfully populated, geospatial metadata can be a critical resource for understanding and managing geospatial data for current and future GIS practitioners and those trying to preserve the data.”-Utilizing Geospatial Metadata to Support Data Preservation Practices, January 2011, GeoMAPP (http://www.geomapp.net/publications_categories.htm)
Best Practices: Documentation
Metadata—represents the who, what, when, where, why and how
Standards:• CSDGM (FGDC)• ISO 19115-2003 / 19139
15
16
FGDC’s Content Standard for Digital Geospatial Metadata (CSDGM)
http://www.fgdc.gov/csdgmgraphical/index.html
18
Checklist: CSDGM Fields for Preservation
Identification Information - basic info about data set, including:• party responsible—usually creator• publication date—date the data set is completed and ready for use• title—”where” “what” “when” • maintenance/update frequency—annually, as needed, based on
census, etc.• bounding coordinates• keywords (theme and place)• access and use constraints—any restrictions, disclaimers, or
guidance on data set attribution• contact details
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
19
Checklist: CSDGM Fields for Preservation
Data Quality Information – provides historical lineage and source descriptions for the data used in the creation of the data set, including:• originator• publisher, publication date & place• “currentness” of source data• process description
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
20
Checklist: CSDGM Fields for Preservation
Spatial Reference Information - description of the reference frame for, and the means to encode, coordinates in the data set, including:• map projection name• coordinate system name• unit of measure• geodetic model—datum, ellipsoid
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
21
Checklist: CSDGM Fields for Preservation
Entity and Attribute Information - details about content of the data set—the entities, their attributes, and domains from which attribute values may be assigned, including:• entity label• attribute label and description
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
22
Checklist: CSDGM Fields for Preservation
Metadata Reference Information - information on the party responsible for creating the metadata and the currentness of the metadata:• metadata standard name• metadata standard version
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
Data Management Initiatives
Federal agency mandates for sponsored research:• NSF & NIH requirements for DM plans• GIS Inventory (Ramona) & Federal Grants data
sharing plans—gisinventory.net
Other related initiatives:• USGS DM working group• DM training for early career researchers
23
24
FGDC Geospatial Data Lifecycle Model
http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf
25
State & National Initiatives in Geospatial Data Archiving
GeoMAPP - Geospatial Multistate Archive and Preservation Partnership (www.geomapp.net):• federally funded partnership between the Library of
Congress and state geospatial and archives staff from North Carolina, Kentucky, Montana, and Utah
National Digital Stewardship Alliance (NDSA), Geospatial Content Team (www.digitalpreservation.gov/ndsa):• report identifying appraisal and selection activities as they
effect decisions defining geospatial content of enduring value for the nation
33
Contact Information:
Jennifer Doty | [email protected] Data Management Specialist
Michael Page | [email protected] & Geospatial Data Librarian
Emory Center for Digital Scholarshipdigitalscholarship.emory.edu