envisioning better integration and synthesis: a data provider perspective

25
Envisioning Better Integration and Synthesis: A Data Provider Perspective Marc Levy CIESIN Earth Institute, Columbia University Presentation to Arctic System Synthesis Workshop: New Perspectives through Data Discovery and Modeling 2–4 April 2007 Seattle, WA World Data Center for Human Interactions in the Environment

Upload: newton

Post on 01-Feb-2016

25 views

Category:

Documents


0 download

DESCRIPTION

World Data Center for Human Interactions in the Environment. Envisioning Better Integration and Synthesis: A Data Provider Perspective. Marc Levy CIESIN Earth Institute, Columbia University Presentation to Arctic System Synthesis Workshop: New Perspectives through Data Discovery and Modeling - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Envisioning Better Integration and Synthesis:

A Data Provider Perspective

Marc LevyCIESIN

Earth Institute, Columbia University

Presentation to Arctic System Synthesis Workshop:New Perspectives through Data Discovery and Modeling

2–4 April 2007Seattle, WA

World Data Center for Human Interactions in the Environment

Page 2: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Evolution of the Data Provider Model

Researcher

Researcher

Repository

Public

ResearcherResearcher

Public

Researcher

Researcher

Data

Cyberinfrastructure

Research community

Decision-making community

Page 3: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Data Domain Understanding (What do these data mean? Are they suitable for my use?)

Access

Linkage Understanding (How do these data relate to other data?)

DiscoveryReliability and continuity

Catalogs

Documentation, visualization

Integration

On-line databases

Stewardship

Interoperability

Standards; georeferencing; time-stamping

Page 4: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Data Domain Understanding (What do these data mean? Are they suitable for my use?)

Access

Linkage Understanding (How do these data relate to other data?)

DiscoveryReliability and continuity

Interoperability

What data communities provide

Page 5: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Data

What data communities provide

What research and policy communities need

Page 6: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Helping Users Make Wise Choices is Hard!

Presumed Practical Competence of Group as Function of Number of Disciplines

0

5

10

15

20

25

1 2 3 4 5

Disciplines

Ph.D

B.A.

High School

5th Grade

Traditional Documentation not enough

Multi-faceted approach required

Comparative Guides

Visualizations

Common Pitfalls

Examples

Citations

Page 7: Envisioning Better Integration and Synthesis: A Data Provider Perspective

http://sedac.ciesin.org/citations/CitationIndex.html

Page 8: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Thornton, P.K., Kruska, R.L., Henninger, N., Kristjanson, P.M., Reid, R.S., and Robinson, T.P. (2003) Locating Poor Livestock Keepers at the Global Level for Research and Development Targeting. Land Use Policy, 20(4): 311-322.

Tobler, W., Deichmann, U., Gottsegen, J., and Maloy, K. (1997) World Population in a Grid of Spherical Quadrilaterals. International Journal of Population Geography, 3: 203-225.

van Lieshout, M., Kovats, R.S., Livermore, M.T.J., and Martens, P. (2004) Climate Change and Malaria: Analysis of the SRES Climate and Socio-Economic Scenarios. Global Environmental Change, 14(1): 87-99.

Vassolo, S. and Döll, P. (2005) Global-scale Gridded Estimates of Thermoelectric Power and Manufacturing Water Use. Water Resources Research, 41.

Verburg, P.H., and Chen, Y. (2000) Multiscale Characterization of Land-Use Patterns in China. Ecosystems, 3(4): 369-385.

Viviroli, D. and Weingartner, R. (2004) The Hydrological Significance of Mountains: from Regional to Global Scale. Hydrology and Earth System Science, 8(6): 1016-1029.

Vorosmarty, C.J., Green, P., Salisbury, J., and Lammers, R.B. (2000) Global Water Resources: Vulnerability From Climate Change Acid Population Growth. Science, 289(5477): 284-288.

Vorosmarty, C.J., and Sahagian, D. (2000) Anthropogenic Disturbance of the Terrestrial Water Cycle. Bioscience, 50(9): 753-765.

White, M.A., Hoffman, F., Hargrove, W.W., and Nemani, R.R. (2005) A Global Framework for Monitoring Phenological Responses to Climate Change. Geophysical Research Letters, 32(L04705): 4pp.

White, M.A., Nemani, R.R., Thornton, P.E., and Running, S.W. (2002) Satellite Evidence of Phenological Differences Between Urbanized and Rural Areas of the Eastern United States Deciduous Broadleaf Forest. Ecosystems, 5(3): 260-273.

Wilson, S.J., Steenhuisen, F., Pacnya, J.M., and Pacnya, E.G. (2006) Mapping the Spatial Distribution of Global Anthropogenic Mercury Atmospheric Emission Inventories. Atmospheric Environment, 40(24): 4621-4623.

Page 9: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Information on data quality is critical to judging goodness of fit

Page 10: Envisioning Better Integration and Synthesis: A Data Provider Perspective

It shouldn’t even be this hard

Page 11: Envisioning Better Integration and Synthesis: A Data Provider Perspective
Page 12: Envisioning Better Integration and Synthesis: A Data Provider Perspective
Page 13: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Helping users make wise choices is a

community-building and community-

strengthening task

Page 14: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Moore’s Law Benefits Data Collection Processes Unequally

=/

0

2

4

6

8

10

12

14

16

18

1950 1960 1970 1980 1990 2000

Cen

sus

cost

(m

illi

on

s 19

90 d

oll

ars

per

per

son

)

0.01

0.1

1

10

100

1000

10000

100000

1000000

Pro

cess

or

cost

($

per

100

0 ca

lcu

lati

on

s),

log

sc

ale Census cost

Processor cost

Pace of progress across data domains is very uneven.

Greater the divergence, greater the Integration Frustration

Page 15: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Identify and Fill Gaps!

CIESIN, Gridded Population of the World, 350,000 census input units

Page 16: Envisioning Better Integration and Synthesis: A Data Provider Perspective
Page 17: Envisioning Better Integration and Synthesis: A Data Provider Perspective

73 M people. Largest urban extent

Page 18: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Exposure to Multiple Natural Hazards

Seismic hazards include earthquakes and volcanoes; hydrological hazards include floods, cyclones, and landslides

Page 19: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Many more gaps to fill!• Roads

• Migration

• Time-series spatial data on population, urbanziation

• Spatial economic data

• Soil fertility

• Spatial health data

Prioritize

Assign roles

Be transparent

Persevere!

Page 20: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Challenge of model data

Input data sets

Harmonized data sets Model

Output data sets

ConclusionsClear ExplanationIncomprehensible

ExplanationTOUGHEST NUT

Page 21: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Interoperability

Standards

Develop, adopt, refine, encourage use of standards for representing and distributing data

Brute Force

Reprocess, reformat, recode data to be consistent with established framework data

Example: Household Surveys

Page 22: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Framework data

• Data that helps structure observations, models, analysis: makes cumulative progress possible– Permanent features of the topography

• Ice cover• Permafrost• Climate zones• Navigable sea lanes

– Organizational devices with universal utility• Latitude, longitude grids

– Location of settlements• Coastal villages

Challenge

How do you create frameworks when everything is in flux?

(How do you build a house during an earthquake?)

You can’t ignore it, but therearen’t many useful examplesin interdisciplinary research

Page 23: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Stewardship

• Almost always under-provided

• Everyone underestimates the speed by which data becomes invisible or unintelligible

• Inter-disciplinary, problem-oriented data especially vulnerable

Page 24: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Data Domain Understanding (What do these data mean? Are they suitable for my use?)

Access

Linkage Understanding (How do these data relate to other data?)

DiscoveryReliability and continuity

Catalogs

Documentation, visualization

Integration

On-line databases

Stewardship

Interoperability

Standards; georeferencing; time-stamping

Page 25: Envisioning Better Integration and Synthesis: A Data Provider Perspective

Conclusions

• We don’t know how to do everything yet, but we know a lot more now than a decade ago

• The investments show positive economies of scale– each step forward getting the data questions right

generates more research and policy return than previous steps

• But what remains is going to require sustained, focused effort– There’s a lot of hard stuff yet to do

• Historically, funders don’t like this kind of work– That seems to be changing