envisioning better integration and synthesis: a data provider perspective
DESCRIPTION
World Data Center for Human Interactions in the Environment. Envisioning Better Integration and Synthesis: A Data Provider Perspective. Marc Levy CIESIN Earth Institute, Columbia University Presentation to Arctic System Synthesis Workshop: New Perspectives through Data Discovery and Modeling - PowerPoint PPT PresentationTRANSCRIPT
Envisioning Better Integration and Synthesis:
A Data Provider Perspective
Marc LevyCIESIN
Earth Institute, Columbia University
Presentation to Arctic System Synthesis Workshop:New Perspectives through Data Discovery and Modeling
2–4 April 2007Seattle, WA
World Data Center for Human Interactions in the Environment
Evolution of the Data Provider Model
Researcher
Researcher
Repository
Public
ResearcherResearcher
Public
Researcher
Researcher
Data
Cyberinfrastructure
Research community
Decision-making community
Data Domain Understanding (What do these data mean? Are they suitable for my use?)
Access
Linkage Understanding (How do these data relate to other data?)
DiscoveryReliability and continuity
Catalogs
Documentation, visualization
Integration
On-line databases
Stewardship
Interoperability
Standards; georeferencing; time-stamping
Data Domain Understanding (What do these data mean? Are they suitable for my use?)
Access
Linkage Understanding (How do these data relate to other data?)
DiscoveryReliability and continuity
Interoperability
What data communities provide
Data
What data communities provide
What research and policy communities need
Helping Users Make Wise Choices is Hard!
Presumed Practical Competence of Group as Function of Number of Disciplines
0
5
10
15
20
25
1 2 3 4 5
Disciplines
Ph.D
B.A.
High School
5th Grade
Traditional Documentation not enough
Multi-faceted approach required
Comparative Guides
Visualizations
Common Pitfalls
Examples
Citations
http://sedac.ciesin.org/citations/CitationIndex.html
Thornton, P.K., Kruska, R.L., Henninger, N., Kristjanson, P.M., Reid, R.S., and Robinson, T.P. (2003) Locating Poor Livestock Keepers at the Global Level for Research and Development Targeting. Land Use Policy, 20(4): 311-322.
Tobler, W., Deichmann, U., Gottsegen, J., and Maloy, K. (1997) World Population in a Grid of Spherical Quadrilaterals. International Journal of Population Geography, 3: 203-225.
van Lieshout, M., Kovats, R.S., Livermore, M.T.J., and Martens, P. (2004) Climate Change and Malaria: Analysis of the SRES Climate and Socio-Economic Scenarios. Global Environmental Change, 14(1): 87-99.
Vassolo, S. and Döll, P. (2005) Global-scale Gridded Estimates of Thermoelectric Power and Manufacturing Water Use. Water Resources Research, 41.
Verburg, P.H., and Chen, Y. (2000) Multiscale Characterization of Land-Use Patterns in China. Ecosystems, 3(4): 369-385.
Viviroli, D. and Weingartner, R. (2004) The Hydrological Significance of Mountains: from Regional to Global Scale. Hydrology and Earth System Science, 8(6): 1016-1029.
Vorosmarty, C.J., Green, P., Salisbury, J., and Lammers, R.B. (2000) Global Water Resources: Vulnerability From Climate Change Acid Population Growth. Science, 289(5477): 284-288.
Vorosmarty, C.J., and Sahagian, D. (2000) Anthropogenic Disturbance of the Terrestrial Water Cycle. Bioscience, 50(9): 753-765.
White, M.A., Hoffman, F., Hargrove, W.W., and Nemani, R.R. (2005) A Global Framework for Monitoring Phenological Responses to Climate Change. Geophysical Research Letters, 32(L04705): 4pp.
White, M.A., Nemani, R.R., Thornton, P.E., and Running, S.W. (2002) Satellite Evidence of Phenological Differences Between Urbanized and Rural Areas of the Eastern United States Deciduous Broadleaf Forest. Ecosystems, 5(3): 260-273.
Wilson, S.J., Steenhuisen, F., Pacnya, J.M., and Pacnya, E.G. (2006) Mapping the Spatial Distribution of Global Anthropogenic Mercury Atmospheric Emission Inventories. Atmospheric Environment, 40(24): 4621-4623.
Information on data quality is critical to judging goodness of fit
It shouldn’t even be this hard
Helping users make wise choices is a
community-building and community-
strengthening task
Moore’s Law Benefits Data Collection Processes Unequally
=/
0
2
4
6
8
10
12
14
16
18
1950 1960 1970 1980 1990 2000
Cen
sus
cost
(m
illi
on
s 19
90 d
oll
ars
per
per
son
)
0.01
0.1
1
10
100
1000
10000
100000
1000000
Pro
cess
or
cost
($
per
100
0 ca
lcu
lati
on
s),
log
sc
ale Census cost
Processor cost
Pace of progress across data domains is very uneven.
Greater the divergence, greater the Integration Frustration
Identify and Fill Gaps!
CIESIN, Gridded Population of the World, 350,000 census input units
73 M people. Largest urban extent
Exposure to Multiple Natural Hazards
Seismic hazards include earthquakes and volcanoes; hydrological hazards include floods, cyclones, and landslides
Many more gaps to fill!• Roads
• Migration
• Time-series spatial data on population, urbanziation
• Spatial economic data
• Soil fertility
• Spatial health data
Prioritize
Assign roles
Be transparent
Persevere!
Challenge of model data
Input data sets
Harmonized data sets Model
Output data sets
ConclusionsClear ExplanationIncomprehensible
ExplanationTOUGHEST NUT
Interoperability
Standards
Develop, adopt, refine, encourage use of standards for representing and distributing data
Brute Force
Reprocess, reformat, recode data to be consistent with established framework data
Example: Household Surveys
Framework data
• Data that helps structure observations, models, analysis: makes cumulative progress possible– Permanent features of the topography
• Ice cover• Permafrost• Climate zones• Navigable sea lanes
– Organizational devices with universal utility• Latitude, longitude grids
– Location of settlements• Coastal villages
Challenge
How do you create frameworks when everything is in flux?
(How do you build a house during an earthquake?)
You can’t ignore it, but therearen’t many useful examplesin interdisciplinary research
Stewardship
• Almost always under-provided
• Everyone underestimates the speed by which data becomes invisible or unintelligible
• Inter-disciplinary, problem-oriented data especially vulnerable
Data Domain Understanding (What do these data mean? Are they suitable for my use?)
Access
Linkage Understanding (How do these data relate to other data?)
DiscoveryReliability and continuity
Catalogs
Documentation, visualization
Integration
On-line databases
Stewardship
Interoperability
Standards; georeferencing; time-stamping
Conclusions
• We don’t know how to do everything yet, but we know a lot more now than a decade ago
• The investments show positive economies of scale– each step forward getting the data questions right
generates more research and policy return than previous steps
• But what remains is going to require sustained, focused effort– There’s a lot of hard stuff yet to do
• Historically, funders don’t like this kind of work– That seems to be changing