data, data everywhere…

33
Data, Data Everywhere…. September 8, 2011 The Coalition for Academic Scientific Computation José-Marie Griffiths, PhD Vice President for Academic Affairs Bryant University, Smithfield, Rhode Island

Upload: calix

Post on 08-Feb-2016

48 views

Category:

Documents


2 download

DESCRIPTION

Data, Data Everywhere…. September 8, 2011 The Coalition for Academic Scientific Computation José-Marie Griffiths, PhD Vice President for Academic Affairs Bryant University, Smithfield, Rhode Island. Concerns of Research Administrators. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data, Data Everywhere…

Data, Data Everywhere….

September 8, 2011The Coalition for Academic Scientific Computation

José-Marie Griffiths, PhD Vice President for Academic Affairs

Bryant University, Smithfield, Rhode Island

Page 2: Data, Data Everywhere…

2

Concerns of Research Administrators

1. Strong advocates of research and its dissemination to as wide a set of audiences as possible.

2. Most concerns today relate to current economic trends and uncertainties.

3. Long been concerned about overhead costs (which are increasing) and the cap on administrative costs.

Page 3: Data, Data Everywhere…

3

Concerns of Research Administrators - 2

4. Concerns about policies translating into “unfunded mandates” (like recently proposed financial reporting requirements to track all federal funding).

5. Increasingly concerned about roles, responsibilities, and liabilities.

6. Size matters!

Page 4: Data, Data Everywhere…

Taking AIM at Data Lifecycle Management:

Access, Integrity, Mediation

Page 5: Data, Data Everywhere…

5

Data Policy Task Force• Established at the February

3-4, 2010 NSB meeting

• Charge: further defining the issues and outlining possible options to make the use of data more effective in meeting NSF's mission.

Page 6: Data, Data Everywhere…

6

Data Policy Task Force Strategies

• Monitor the impact of NSF updated implementation of the Data Management Plan requirement to inform a review of NSF policy

• Considering issues of data policy, Open Data movements, and related issues, the Task Force will then develop a "Statement of Principles.”

• Provide guidance to subsequent Board efforts to develop specific actionable policy recommendations focused, initially, on NSF, but that could potentially promulgate through other Federal agencies in a national and international context.

Page 7: Data, Data Everywhere…

7

NSB Task Force on Data PolicyStatement of Principles

1. Openness and transparency are critical to continued scientific and engineering progress and to building public trust in the nation’s scientific enterprise. – This applies to all materials necessary for

verification, replication and interpretation of results and claims, associated with scientific and engineering research.

2. Open Data sharing is closely linked to Open Access publishing and they should be considered in concert.

3. The nation’s science and engineering enterprise consists of a broad array of stakeholders, all of which should participate in the development and adoption of policies and guidelines.

Page 8: Data, Data Everywhere…

8

NSB Task Force on Data PolicyStatement of Principles - 2

4. It is recognized that standards and norms vary considerably across scientific and engineering fields and such variation needs to be accommodated in the development and implementation of policies.

5. Policies and guidelines are needed for open data sharing which in turn requires active data management.

6. All data and data management policies must include clear identification of roles, responsibilities and resourcing.

Page 9: Data, Data Everywhere…

9

NSB Task Force on Data PolicyStatement of Principles - 3

7. The rights and responsibilities of investigators are recognized. Investigators should have the opportunity to analyze their data and publish their results within a reasonable time.

Page 10: Data, Data Everywhere…

10

NSB Expert Panel Discussion on Data Policies

• March 28-29, 2011• Arlington, VA• Participants included:–Over 30 experts/research

administrators– 7 NSB members– 4 NSF Directors/Staff

Page 11: Data, Data Everywhere…

11

Access, Integrity, Mediation

• Access – “what goes in must be able to come out!”

• Integrity – “what goes in must be the same thing that comes out!”

• Mediation – “what goes in is going to need help coming out!”

Page 12: Data, Data Everywhere…

12

Key Areas Emerging from theExpert Panel Discussion on Data Policies

March, 2011

• ACCESS1. Standards and interoperability enable

data-intensive science.2. Data sharing is an identified priority.

• INTEGRITY3. Recognize and support computational

and data-intensive science as a discipline.

• MEDIATION4. Storage, preservation, and curation of

data are critical to data sharing and management (data stewardship).

5. Cyberinfrastructure is necessary to support data-intensive science.

Page 13: Data, Data Everywhere…

13

ACCESS

What goes in must be able to come out!

Access

Integrity

Mediation

Page 14: Data, Data Everywhere…

14

Key Areas - National Science BoardExpert Panel Discussion on Data Policies

March, 2011

• ACCESS1. Standards and

interoperability enable data-intensive science.

2. Data sharing is an identified priority.

Page 15: Data, Data Everywhere…

15

Standards and interoperability enable data-intensive science.

• Citation and attribution norms– Need new norms and practices– Data producers, software & tool

developers, data curators get credit for their work

• Interoperability standards– To enable sharing & interoperability

across disciplines and internationally• Development of persistent identifiers– To enable tracking of provenance– Ensure data integrity (see next section)– Facilitate citation & attribution

Page 16: Data, Data Everywhere…

16

Interoperability - sooner rather than later

Page 17: Data, Data Everywhere…

17

Data sharing is an identified priority.

• Must balance privacy concerns and data access for sharing and re-use.

• Acknowledge disciplinary cultures while establishing a culture of sharing across all research communities.

• Must promote & reward exemplary data management projects & plans.

• Data availability must be timely – issues of embargoes and restricted use durations.

Page 18: Data, Data Everywhere…

18

Page 19: Data, Data Everywhere…

19

INTEGRITY

What goes in must be the same thing that comes out!

Access

Integrity

Mediation

Page 20: Data, Data Everywhere…

20

Recognize and support computational and data-

intensive science as a discipline.• Recognize & reward computational & data

scientists & curators: funding, tenure, etc.• Support training in computational science• Reward international collaborations to

develop cyberinfrastructure, data stewardship, interoperability, international sharing

• New funding/economic models to support processing, storing, archiving, maintaining data sets.

• Need to define who is responsible for what – funding agencies/publishers versus research communities

Page 21: Data, Data Everywhere…

21

Office of Research Integrity, U.S. Department of Health and Human

Services: Key Components of Data Lifecycle

Management

Guidelines for Responsible Data Management in Scientific Research, ori.hhs.gov/education/products/clinicaltools/data.pdf

Page 22: Data, Data Everywhere…

22

Planning for Preservation over the Data Life Cycle

1. Anticipate archiving costs and challenges2. Create a data management plan3. Follow best practices for data and

documentation4. Manage master datasets and work files5. Determine file formats to deposit6. Comply with dissemination standards and

formats7. Set up support for data usersCourtesy of Cole Whiteman, ICPSR

Proposal Planning

and Writing

Project Start-up and

Data Managemen

t

Data Collectio

n and File

Creation

Data Analysi

s

Preparing Data for Sharing

Depositing Data

After-Deposit Archival Activitie

s

1 2 3 4 5 6 7

Page 23: Data, Data Everywhere…

23

Integrity Concerns for Research Institutions

• What to share - raw, processed, analyzed datasets, instruments, calibration and environmental records, analytical tools, etc.

• Processes for and costs of long-term curation of data

Page 24: Data, Data Everywhere…

24

MEDIATION

What goes in is going to need help coming out!

Access

Integrity

Mediation

Page 25: Data, Data Everywhere…

25

Storage, preservation, and curation of data are critical to data sharing

and management (data stewardship)• Funding agencies must commit to ongoing

financial support for repositories (no “orphans”)

• Standardized curatorial mechanisms• Strategic partnerships between stakeholder

communities and data repositories, supported by funders

• Define roles of different types of digital repositories

• Possibly independent auditing of data repositories to ensure data quality, access, interoperability

Page 26: Data, Data Everywhere…

26

Cyberinfrastructure is necessary to support data-intensive science• Geographic distribution of research

teams, computing resources and datasets requires robust cyberinfrastructure

• Must include shared applications for analysis, visualization and simulation

• Standardization for interoperability & accessibility

• Need capital investment in cyberinfrastructure

• Need to define appropriate ratio of infrastructure to research funding

Page 27: Data, Data Everywhere…

27

Mediation is Needed at Data Collection, Analysis and Use

• Gio Weiderhold, Stanford: When there is high intensity of interaction with any of these elements, it makes sense to have multiple mediators (e.g. replicate repositories)

Collected Research

Data Set A

Collected Data Set

B

Repository 2Repository

1Repository

3

Use

Repository 4

Use UseUse

Analysis AnalysisAnalysis

Analysis

Page 28: Data, Data Everywhere…

28

Informal and Formal Mediation• Mediation at “Use” level is informal and pragmatic• Mediation at “Repository” and “Analysis” level needs

to be formal with domain/expert control*

Collected Research

Data Set A

Collected Data Set B

Repository 2Repository

1Repository

3

Use

Repository 4

Use Use Use

Analysis Analysis AnalysisAnalysis

*Gio Weiderhold, Stanford, 1995

Informal, pragmatic mediation

Formal mediationwith domain/expert

control

Page 29: Data, Data Everywhere…

29

?????

?????

?????

ResearchInstitution

s

Professional

organizations

PrivateIndustr

y

Research Registry

Application

archives

Research

Funders

????

Gov’t. Agencies

Data Reposito

ries

Metadata

Libraries

Data Librar

ies

Stakeholders – Multiple Players, Inter-relationships

Researchers

Public Advoca

cy Groups

Publishers

Page 30: Data, Data Everywhere…

30

?????

?????

?????

Research

Institutions

Professional

organizations

PrivateIndustry

Research

Registry

Application

archives

Data Archive

s

Research

Funders

????

Gov’t.

Agencies

Data Reposit

ories

Metadata

Libraries

Data Librari

es

For data to be discoverable, must have a shared overlay of interdisciplinary and technological connections

Researchers

Public Advoca

cy Groups

Publishers Analysis Tools

Retrieval conventions

StandardsOntologies

Metadata

Page 31: Data, Data Everywhere…

31

This….or….This?

Page 32: Data, Data Everywhere…

This….or….This?

Page 33: Data, Data Everywhere…

33

José-Marie Griffiths, Ph.D.

• Vice President for Academic AffairsBryant University1150 Douglas PikeSmithfield, RI 02917(401) 232-6060

[email protected]@gmail.com