lecture 4 data models jeffery s. horsburgh hydroinformatics fall 2012 this work was funded by...

51
Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Upload: job-griffith

Post on 18-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Lecture 4Data Models

Jeffery S. HorsburghHydroinformatics

Fall 2012

This work was funded by National Science Foundation Grant EPS 1135482

Page 2: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Objectives

• Identify and describe important entities and relationships to model data

• Describe important data models used in Hydrology such as the Observations Data Model (ODM), ArcHydro, and NetCDF

Page 3: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

What is a Data Model?

• Abstract model that documents and organizes data

• Explicitly provides the definition of and determines the structure of data

• Used as a plan and structure for developing applications that use the data

Page 4: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Data Models

• Define the “entity” types within a domain

Methods (how)

Sites (where)

Values

Data Sources (who)

Page 5: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Entities Associated with Observations

• Variables – the things you measure or observe• Observers – who made the observation• Samples – a bottle of water, a sediment core• Offsets – distance below ground, below

surface, etc.• Versions – raw data, processed data,

simulations• Qualifiers – limitations to data use

Page 6: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Data Models

• Define the attributes of entities

Entity = Site

Attributes Values• Site Name: Little Bear River near

Wellsville• Site Code: USU-LBR-Wellsville• Latitude: 41.643457• Longitude: -111.917649• Elevation: 1365 m• State: Utah• County: Cache• Description: Attached to SR101 bridge.• Site Type: Stream

Page 7: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Data Models

• Define the relationships among entities

Water temperature values in degrees Celsius measured in the Little Bear River at Mendon Road using a Hydrolab MS5 multiparameter sonde by Utah State University

Site

Variable and Method

Source

Values

Page 8: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Data Models

• Define the “business rules” for data– Observations are recorded at one and only one

site– One or more variables are measured at a site– A site must have a name– A variable name must be chosen from a controlled

vocabulary

Page 9: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Types of Data Models

• Relational data models – e.g., relational databases

1

* *

1

Page 10: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Relational Data Models

• Great for data with many transactions• Great in a multiple-user environment• Powerful query language – Structured Query

Language (SQL)• Robust database servers and software tools

available

Page 11: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Types of Data Models

• File based data models– ESRI File Geodatabase– NetCDF

• Structured file or set of files that store data

Page 12: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

File Based Data Models

• Usually tied to a tool or set of tools for reading, writing, etc.

• Can be portable across platforms• Can be optimized for performance or

compression (e.g., custom binary files)

Page 13: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Types of Data Models

• Extensible Markup Language (XML) schemas

Page 14: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

XML Schemas

• Great for transporting data in a machine readable format

• Platform and programming language independent

• Special form of file based data model

Page 15: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Types of Data Models

• Object models

Page 16: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Object Models

• A collection of objects or classes through which a computer program can manipulate data

• Objects have “properties” and “methods”• Container that wraps data within a set of

functions– Ensure that the data are used appropriately– Provide standardized, reusable functionality

Page 17: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Object Model

Class/Object

Properties

Methods

Page 18: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Some Data Models Commonly Used in Hydrology

• CUAHSI Observations Data Model (ODM)• Arc Hydro • Arc Hydro Groundwater• NetCDF

Page 19: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Observations Data Model (ODM)

Soil moisture

data

Streamflow

Flux tower data

Groundwaterlevels

Water Quality

Precipitation& Climate

• A relational database at the single observation level• Metadata for unambiguous interpretation• Traceable heritage from raw measurements to usable

information• Promote syntactic and semantic consistency • Cross dimension retrieval and analysis

Horsburgh, J. S., D. G. Tarboton, D. R. Maidment, and I. Zaslavsky (2008), A relational model for environmental and water resources data, Water Resources Research, 44, W05406, doi:10.1029/2007WR006392.

Page 20: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

What are the basic attributes to be associated with each single data value and

how can these best be organized?

Space, S

Time, T

Variables, V

s

t

vi

vi (s,t)“Where”

“What”

“When”

A data value

Variable

Method

Quality Control Level

Sample Medium

Value Type

Data Type

Source/Organization

Units

Accuracy

Censoring

Qualifying comments

Location

Feature of interest

DateTime

Interval (support)

Page 21: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Data Series – A Time Series of Hydrologic Observations

Space

Variable, Vi

Site, Sj

End Date Time, t2

Begin Date Time, t1

Time

Variables

Count, C

There are C measurements of Variable Vi at Site Sj from time t1 to time t2

Defined by unique combinations of:• Site• Variable• Method• Source• Quality Control Level

Page 22: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

ODM 1.1.1

Sites(where)

Variables(what)

Methods(how)

Sources(who)

Quality Control Levels

Values +(when)

Page 23: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Controlled Vocabularies

Page 24: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Controlled VocabulariesReducing Semantic Heterogeneity

Page 25: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Implementing ODM

• Relational database schemas exist for:– Microsoft SQL Server– MySQL

Page 26: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

ODM Example: Water Quality from a Profile in a Lake

Page 27: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Linking Point Observations to Hydrologic Features

Page 28: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Arc Hydro: GIS for Water Resources

• Arc Hydro– An ArcGIS data model for

water resources– Arc Hydro toolset for

implementation– Framework for linking

hydrologic simulation models

The Arc Hydro data model andapplication tools are in the publicdomain

Published in 2002, now in revision for Arc Hydro II

Page 29: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Real World Hydrologic Features

Page 30: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

What are some important entities in a data model for surface water hydrology?

Page 31: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Streams

Watersheds Waterbody

Hydro Points

Arc Hydro Framework Input Data

Page 32: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

!(

!(

!(!(

!( !(!( !(!( !(!(

!(!(

!(!(

!(!( !(!( !(

!( !(

!( !(!(!( !(!(

!(

!(

!(

!( !(!(!( !(

!(!(

!(!( !(!(!( !( !(!(!( !(!(

!( !(!(!( !(!(

!(!(

!(!(!(

!(

!(!(

!(

!(

!(!(

!(

!(

!(

!(!(!(

!(

Feature

Waterbody

HydroIDHydroCodeFTypeNameAreaSqKmJunctionID

HydroPoint

HydroIDHydroCodeFTypeNameJunctionID

Watershed

HydroIDHydroCodeDrainIDAreaSqKmJunctionIDNextDownID

ComplexEdgeFeature

EdgeType

Flowline

Shoreline

HydroEdge

HydroIDHydroCodeReachCodeNameLengthKmLengthDownFlowDirFTypeEdgeTypeEnabled

SimpleJunctionFeature

1HydroJunction

HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole

*

1

*

HydroNetwork

*

HydroJunction

HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole

HydroJunction

HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole

Arc Hydro FrameworkData Model

Page 33: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

What Can I do with ArcHydro?ArcHydro defines flow lines and junctions and encodes flow directions

• ArcHydro encodes relationships among watersheds, streams, and junctions

• Establishes hydrologic connectivity between polygon catchments (polygons), stream reaches (lines), and junctions (points)

Page 34: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

What Can I Do with ArcHydro?

Network Tracing

Select all streams above a point Select the

downstream path for a point

Page 35: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Arc Hydro Tools for ArcGIS

• Terrain analysis: preparing DEM derivatives• Watershed processing: watershed delineation

from DEMs• Attribute tools: computing and populating

attributes and identifiers• Network tools: creating the hydro network

Focus: getting data into Arc Hydro and working with it once it is there.

Page 36: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Arc Hydro Time Series• Variable: string describing

what is being measured or calculated

• Units: string describing units• IsRegular: boolean inidicating

if the data are regularly spaced• TSInterval: controlled

vocabulary for time intervals• DataType: statistic for value

measured over interval• Origin: indication of whether

the values are measured or calculated

Page 37: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Arc Hydro Groundwater

Data model and tools for managing groundwater

data in ArcGIS

Page 38: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

What are important entities in a groundwater data model?

Page 39: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Arc Hydro GW Data Model

Page 40: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Arc Hydro GW Tools

Groundwater Analyst

Subsurface Analyst

MODFLOW Analyst

Page 41: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

NetCDF

• A platform independent format for representing multi-dimensional, array-orientated scientific data

• Continuous space-time data model – Both time and space are varying

• Especially useful for time-varying grids– Time varying precipitation fields (e.g., radar rainfall

data)• Used extensively in the weather and climate

domains

Page 42: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

NetCDF Characteristics

NetCDF (network Common Data Form)

• Self Describing - a netCDF file includes information about the data it contains

• Direct Access - a small subset of a large dataset may be accessed efficiently, without first reading through all the preceding data

• Sharable - one writer and multiple readers may simultaneously access the same netCDF file

Page 43: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Multidimensional Data

141 241 341

131 231 331

121 221 321

111 211 311

441

431

421

411

142 242 342

132 232 332

122 222 322

112 212 312

442

432

422

412

143 243 343

133 233 333

123 223 323

113 213 313

443

433

423

413

Y

X

TimeTime = 1

Time = 2

Time = 3

http://www.unidata.ucar.edu

Page 44: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Multidimensional Data – Space and Time

Page 45: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

The NetCDF FileNetCDF is a binary file

A NetCDF file consists of:Global Attributes: Describe the contents of the fileDimensions: Define the structure of the data

(e.g., Time, Depth, Latitude, Longitude)Variables: Holds the data in arrays shaped

by DimensionsVariable Attributes: Describes the contents of

each variableCDL (network Common Data form Language) description takes the following

formnetCDF name {

dimensions: ... variables: ... data: ...

}

Page 46: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Considerations in Modeling Data

• Is there an existing data model that will work for my data?

• What are the top 20 queries or analyses you need to do with the data?

• What software do I want to use?• How will you want to share the data?

Page 47: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Advantages of Formal Data Models• Provide a high degree of structure to data• Generally implemented in software that has

robust querying, manipulation, and visualization capabilities (e.g., RDBMS or GIS)

• Facilitate software development• Can help in capturing the semantics of data

Page 48: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Disadvantages

• Can be stiff and difficult to change• Difficult to anticipate needs in the design

stages• Can be incompatible across organizations• Can become complex

Page 49: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Summary (1)

• A data model provides a definition of a formal structure for data

• There are several flavors of data models, each with different strengths, weaknesses, and appropriate uses

• Data models can facilitate software development

Page 50: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

Summary (2)

• Common data models used in hydrology– The CUAHSI Observations Data Model (ODM) provides

an organizational structure for hydrologic time series data

– Arc Hydro is a geographic data model for surface hydrologic features

– ArcHydro Groundwater adds subsurface hydrologic features, geology, borehole data, and hydrostratigraphy

– NetCDF combines both geospatial and temporal domains into a continuous space-time data model

Page 51: Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS 1135482

References and CreditsHorsburgh, J.S., D.G. Tarboton (2012). CUAHSI Community Observations Data Model (ODM)

Version 1.1.1 Design Specifications, CUAHSI, Washington, D.C, http://www.codeplex.com/Download?ProjectName=HydroServer&DownloadId=349176

Horsburgh, J. S., D. G. Tarboton, D. R. Maidment, and I. Zaslavsky (2008), A relational model for environmental and water resources data, Water Resources Research, 44, W05406, http://dx.doi.org/10.1029/2007WR006392.

Maidment, D.R. (ed.) (2002). Arc Hydro GIS for Water Resources, ESRI Press, Redlands, CA, 203 p.

Strassberg, G., N.L. Jones, D.R. Maidment (2011). Arc Hydro Groundwater GIS for Hydrogeology, ESRI Press, Redlands, CA, 160 p.

Credits:Arc Hydro slides used with permission from David Maidment, University of Texas at Austin.ArcHydro Groundwater slides used with permission from Norm Jones, Brigham Young University/Aquaveo.