data archiving & preservation: best practices for gis

33
Presentation to ARGIS - Atlanta Region GIS User Group October 30, 2013 Data Archiving & Preservation: Best Practices for GIS Jennifer Doty | [email protected] Data Management Specialist Emory Center for Digital Scholarship

Upload: lavonn

Post on 25-Feb-2016

85 views

Category:

Documents


0 download

DESCRIPTION

Data Archiving & Preservation: Best Practices for GIS. Presentation to ARGIS - Atlanta Region GIS User Group October 30, 2013. Jennifer Doty | [email protected] Data Management Specialist Emory Center for Digital Scholarship. Overview. Best practices for managing geospatial data: - PowerPoint PPT Presentation

TRANSCRIPT

Presentation to ARGIS - Atlanta Region GIS User GroupOctober 30, 2013

Data Archiving & Preservation: Best Practices for GIS

Jennifer Doty | [email protected] Data Management Specialist

Emory Center for Digital Scholarship

Overview

Best practices for managing geospatial data:• File formats• Naming conventions• Folder structure• Storage and backup• Documentation

Trends in geospatial data archiving:• Federal funding agencies’ requirements• State initiatives for preservation

2

Best Practices: File Formats Type of data Acceptable formats for

sharing, reuse and preservation

Other acceptable formats for data preservation

Geospatial datavector and raster data

ESRI Shapefile (essential - .shp, .shx, .dbf, optional - .prj, .sbx, .sbn)

geo-referenced TIFF (.tif, .tfw)

CAD data (.dwg)

tabular GIS attribute data

ESRI Geodatabase format (.mdb, .gdb)

MapInfo Interchange Format (.mif) for vector data

Keyhole Mark-up Language (KML) (.kml)

Adobe Illustrator (.ai), CAD data (.dxf or .svg)

binary formats of GIS and CAD packages

3UK Data Archive File Formats guide, http://www.data-archive.ac.uk/create-manage/format/formats-table

4

Best Practices: File Formats

GeoMAPP Geospatial Data File Formats Reference Guide:• provides quick reference of common

geospatial raster and vector dataset types• serves as tool to identify geospatial format

types based on file extensions• also includes information on standards and

specifications for documenting geospatial data

http://www.geomapp.net/docs/GeoMAPP_Geospatial_data_file_formats_FINAL_20110701.xls

Best Practices: Naming Conventions

• Create meaningful but brief naming conventions for your project

• Use file names to classify broad types of files • Avoid using spaces and special characters• Begin names with letters, not numbers

e.g. Census2010_blockgroups_GA, not 2010Census…• Avoid very long file names

5

6

Best Practices: Naming Conventions

Example: keyword_steward_extent_date.ext

• Keyword (essential)—be as descriptive of the contents of the data as possible by using a word or short phrase

• Steward (essential)—either the creator of the dataset or the last one to make a significant modification to a dataset

• Extent (optional)—may be included to indicate resolution of the data (e.g. county, state, or international)

• Date (optional)—may be used to indicate the date of creation or the age range of the content. Recommended format is YYYYMMDD

Indiana Geographic Information Council, http://www.igic.org/standards/namingstandard.pdf

7

Best Practices: Naming Conventions

Versioning:• useful to indicate file revisions or edits,

especially in collaborations• can be through discrete or continuous

numbering, depending on minor or major revisions– think of software versioning—ArcGIS 10 was

significant change from 9.x., but ArcGIS 10.1 was (relatively) minor change to 10

Best Practices: Folder Structure

• Separate directories for scratch workspace and final data

• Hierarchy—is deep or shallow best for your project?

8

9

Tape library, CERN, Geneva by Cory Doctorow / CC BY-SA 2.0

Best Practices: Storage & Backup

Storage Considerations:• Accessibility • Read/Write speed• Size limits—overall vs. file size

Options:• Local—PC drive, flash drive, external hard drive• Server—department/organization server space• Cloud—Dropbox, Google Drive, etc.

11

Best Practices: Storage & Backup

Backup Considerations:• Accessibility (local, server, cloud)• Redundancy (rule of thumb—here, near, far)

Options:• Incremental/Snapshot• Automated

12

Met

adat

a is

a lo

ve n

ote…

by

sara

h0s /

CC

BY-N

C-N

D 2.

0

14

Best Practices: Documentation

“When thoughtfully populated, geospatial metadata can be a critical resource for understanding and managing geospatial data for current and future GIS practitioners and those trying to preserve the data.”-Utilizing Geospatial Metadata to Support Data Preservation Practices, January 2011, GeoMAPP (http://www.geomapp.net/publications_categories.htm)

Best Practices: Documentation

Metadata—represents the who, what, when, where, why and how

Standards:• CSDGM (FGDC)• ISO 19115-2003 / 19139

15

16

FGDC’s Content Standard for Digital Geospatial Metadata (CSDGM)

http://www.fgdc.gov/csdgmgraphical/index.html

17

CSDGM Fields for Preservation

18

Checklist: CSDGM Fields for Preservation

Identification Information - basic info about data set, including:• party responsible—usually creator• publication date—date the data set is completed and ready for use• title—”where” “what” “when” • maintenance/update frequency—annually, as needed, based on

census, etc.• bounding coordinates• keywords (theme and place)• access and use constraints—any restrictions, disclaimers, or

guidance on data set attribution• contact details

GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf

19

Checklist: CSDGM Fields for Preservation

Data Quality Information – provides historical lineage and source descriptions for the data used in the creation of the data set, including:• originator• publisher, publication date & place• “currentness” of source data• process description

GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf

20

Checklist: CSDGM Fields for Preservation

Spatial Reference Information - description of the reference frame for, and the means to encode, coordinates in the data set, including:• map projection name• coordinate system name• unit of measure• geodetic model—datum, ellipsoid

GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf

21

Checklist: CSDGM Fields for Preservation

Entity and Attribute Information - details about content of the data set—the entities, their attributes, and domains from which attribute values may be assigned, including:• entity label• attribute label and description

GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf

22

Checklist: CSDGM Fields for Preservation

Metadata Reference Information - information on the party responsible for creating the metadata and the currentness of the metadata:• metadata standard name• metadata standard version

GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf

Data Management Initiatives

Federal agency mandates for sponsored research:• NSF & NIH requirements for DM plans• GIS Inventory (Ramona) & Federal Grants data

sharing plans—gisinventory.net

Other related initiatives:• USGS DM working group• DM training for early career researchers

23

24

FGDC Geospatial Data Lifecycle Model

http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf

25

State & National Initiatives in Geospatial Data Archiving

GeoMAPP - Geospatial Multistate Archive and Preservation Partnership (www.geomapp.net):• federally funded partnership between the Library of

Congress and state geospatial and archives staff from North Carolina, Kentucky, Montana, and Utah

National Digital Stewardship Alliance (NDSA), Geospatial Content Team (www.digitalpreservation.gov/ndsa):• report identifying appraisal and selection activities as they

effect decisions defining geospatial content of enduring value for the nation

Open GeoPortal @ Emory

NASA Goddard Photo and Video / CC BY

32

Gree

n Q

uesti

on M

ark

by m

ikec

ogh

on F

lickr

/ CC

BY

33

Contact Information:

Jennifer Doty | [email protected] Data Management Specialist

Michael Page | [email protected] & Geospatial Data Librarian

Emory Center for Digital Scholarshipdigitalscholarship.emory.edu