icsti/itoc 15 october 2013 larry lannom
DESCRIPTION
ICSTI/ITOC 15 October 2013 Larry Lannom Research Data Alliance Corporation for National Research Initiatives. RESEARCH DATA ALLIANCE. Corporation for National Research Initiatives. DAITF: Enabling Technologies 21 March 2012 - PowerPoint PPT PresentationTRANSCRIPT
ICSTI/ITOC
15 October 2013
Larry Lannom
Research Data AllianceCorporation for National Research Initiatives
Corporation for National Research InitiativesRESEARCH DATA ALLIANCE
DAITF: Enabling Technologies
21 March 2012
Larry LannomCorporation for National Research Initiatives
http://www.cnri.reston.va.us/http://www.handle.net/
Corporation for National Research Initiatives
Enabling Technologies
ID
010001010010011011010101001101010000
ID
010001010010011011010101001101010000
IDID
IDID
IDID
IDID
010001010010011011010101001101010000
ID
Datasets
ID
Scientists, Data Curators,End Users, Applications
Corporation for National Research Initiatives
Accessed via Repositories
Enabling Technologies
01000101..
ID
ID
ID
ID
ID
ID
ID
ID
ID
Datasets
01000101..
ID
ID
ID
ID
01000101..
ID
01000101..
ID
01000101..
IDID
ID
Scientists, Data Curators,End Users, Applications
Corporation for National Research Initiatives
Scientists, Data Curators,End Users, Applications
EnablingTechnologies
Discovery
Enabling Technologies
Accessed via Repositories
01000101..
ID
ID
ID
ID
ID
ID
ID
ID
ID
Datasets
01000101..
ID
ID
ID
ID
01000101..
ID
01000101..
ID
01000101..
IDID
ID
Corporation for National Research Initiatives
Discovery & Evaluation
• Search– Metadata registries
• Subject• Parties• Dates• Etc
– Crawlers – more ad hoc
• Citation– Formats
• Permissions– Can I see it?– Can I use it?
• Trust
Corporation for National Research Initiatives
Scientists, Data Curators,End Users, Applications
Discovery
Access
Enabling Technologies
Accessed via Repositories
01000101..
ID
ID
ID
ID
ID
ID
ID
ID
ID
Datasets
01000101..
ID
ID
ID
ID
01000101..
ID
01000101..
ID
01000101..
IDID
ID
EnablingTechnologies
Corporation for National Research Initiatives
Access
• ID / reference resolution– Go from ‘subject search’ to ‘known item’ search
• Access Protocols– How to get it– Protocol registries– Bootstrapping into new protocols
• Authentication & Authorization– Proof of identity (tradeoff: usability vs security)– Permissions: with the object or in some external system?
Corporation for National Research Initiatives
Scientists, Data Curators,End Users, Applications
EnablingTechnologies
Discovery
Access
Interpretation
Enabling Technologies
Accessed via Repositories
01000101..
ID
ID
ID
ID
ID
ID
ID
ID
ID
Datasets
01000101..
ID
ID
ID
ID
01000101..
ID
01000101..
ID
01000101..
IDID
ID
Corporation for National Research Initiatives
Interpretation• Registries
– Schemas– Vocabularies– Formats– Available services– Useful client-side tools
• Trust– Who did this?– Who owns this?
• Provenance– Data Source– Processing steps– Computing environment
• what is needed to trust the numbers?• Domain specific?
Corporation for National Research Initiatives
Scientists, Data Curators,End Users, Applications
EnablingTechnologies
Discovery
Access
Interpretation
Reuse
Enabling Technologies
Accessed via Repositories
01000101..
ID
ID
ID
ID
ID
ID
ID
ID
ID
Datasets
01000101..
ID
ID
ID
ID
01000101..
ID
01000101..
ID
01000101..
IDID
ID
Corporation for National Research Initiatives
Reuse
• Everything from Interpretation slide + Permissions– Example from BOF: I need to understand a data set for peer review
but that doesn’t give me permission to use the data
• Validation• Education & Training
– Integrate ‘live’ data into education and training
• Repurpose data
Corporation for National Research Initiatives
DAITF Roles?
• Bring good people together on a regular basis to discuss these issues
• Get agreement on vocabulary for discussing data access and interoperability?
• Working groups on specific topics– Prototyping specific interoperability issues / domains
• Create high-level framework, ala OAIS? Multiple frameworks?
• Guides to Registries and Best Practices
Research Data Alliance Plenary 2 Update
Dr. Francine BermanChair, RDA/US
Hamilton Distinguished Chair in Computer ScienceRensselaer Polytechnic Institute
15
RDA Plenary 2 368 participants from 22
countries and all sectors
All-hands stakeholder talks and RDA working meeting
Data Citation Summit convened by DataCite, FORCE11,CODATA/ICST, ESIP, DCC, etc. to create a common agenda
~5000 tweets over 3 days
RDA Plenary 2 -- September 16-18, Washington D.C. -- 3 days of Peace, Love and Data
16RDA Community Current Status: ~1300 participants from 50+ countries
1. Albania2. Australia3. Austria4. Bangladesh5. Belgium6. Bolivia7. Botswana8. Brazil9. Bulgaria10.Canada 11.China12.Congo
{Democratic Rep}
13.Costa Rica14.Czech
Republic15.Denmark16.Estonia17.Finland
18.France19.Germany20.Greece21.Iceland22.India23.Iran24.Ireland25.Ireland
{Rep}26.Italy27.Japan28.Krygrystan29.Kuwait30.Mexico31.Netherlands32.New Zealand33.Norway34.Palestine35.Poland36.Portugal
37.Russian Federation
38.Rwanda39.Serbia40.Singapore41.Slovenia42.South Africa43.South Korea44.Spain45.Sweden46.Switzerland47.Taiwan48.Turkey49.United Arab
Emirates 50.United
Kingdom51.United States52.Vatican City53.Venezuela
RDA by Sector
Academics (66%)Private Sector (10%)Public Sector (17%)Unknown (7%)
Fran Berman
17
Growth in number and scope of Interest Groups and Working Groups New: BOFs for groups as precursor to
Interest Groups
Groups beginning to “self-monitor” to promote concrete deliverables to be used and adopted
Increasing interest in more interaction and “connective tissue” between groups
Pressing To-Dos before Plenary 3: Develop an RDA policy for IP that comes up
in Interest and Working Groups
Determine the form of RDA deliverables and what’s needed in terms of an “RDA archive”
RDA Community Building Momentum
18
Birds-of-a-Feather Linked Data Chemical Safety Data Education and Skills
Development in Data Intensive Science
Libraries and Research Data
Cloud Computing and Data Analysis Training for the Developing World
Working Groups Data Type Registries Metadata Standards Practical Policy Persistent Identifier Types Data Foundations and
Terminology Data Categories and
Codes
Interest Groups Agricultural Data Big Data Analytics Data Brokering Certification of Trusted
Repositories (joint with ICSU-WDS)
Long tail of Research Data
Marine Data Harmonization
Community Capability Model
Data Publishing (joint with WDS)
Toxicogenomics Interoperability
Research Data Provenance
Data Citation Metadata
Economic Models and Infrastructure for Federated Materials Data Management
Engagement Preservation e-
Infrastructure Legal Interoperability (joint
with CODATA) Global Registry of
Trusted Data Repositories and Services
Digital Practices in History and Ethnography
Data Citation Harmonization Summit DataCite,FORCE11,
CODATA/ICST, ESIP, DCC, etc.
Groups that Met at the RDA Plenary
BOLD = new since last Plenary
19
Organizational Assembly = Organizational Members (subscription) + Organizational Affiliates (MOUs).
Organizational Advisory Board will representOrganizational Assembly.
Current Status: Organizational Membership under
discussion with Microsoft, IBM, ANDS, Australian Antarctic Data Center, Intersect, Terrestrial Ecosystems Research Network, CSC – IT, Center for Science Ltd., Oracle, STFC, CNRI, STM, EUDAT, Barcelona Supercomputer Center, Columbia University Libraries / Information Services,
and many more after the Plenary
Organizational Affiliation under discussion with CODATA, WDS and others
Next 6 months (before Plenary 3)
Firm up model for Affiliates (how many, how substantive should the interaction be?)
Complete creation of legal entity to host subscriptions for Organizational Members
Elect Organizational Advisory Board at Plenary 3
RDA Organizational Partners New RDA constituencies / stakeholders
20RDA Constituent Groups Coming Together
New Position: RDA recruiting for full-time Secretary- General
RDA Colloquium (National Research Agencies and Funders)
RDA Membership
RDA Council (overarching leadership)
Technical Advisory Board
(Technical oversight)
Secretary-General and Secretariat
(Administration and Operations)
Organizational Advisory Boards
and Organizational Assembly
(Organizational partnerships and
guidance)
Working Groups and Interest Groups(impact - focused infrastructure)
21
Plenary 3 will be in Dublin March 26-28 in 2014, hosted by Australia and Ireland
Plenary 4 will be in the Netherlands – late September in 2014
Plenary 5 or 6 likely back in the U.S. (west coast?)
Next Plenaries (2X a year)
Data Type Registries (DTR)
Co-ChairsLarry Lannom: CNRIDaan Broeder: MPI
September 2013
RDA Plenary 2Washington, DC
Research Data Alliance Corporation for National Research Initiatives
• Data Types– Characterize data structures at multiple levels of granularity– Formats are just part of the story– Optimize interactions between data producers & consumers by
having types defined and associated with the data they describe– Types should be standardized, discoverable, and unique
• Type Registries– Each type registered with unique identifier– Common data model and expression– Associate with services, tools, format registries, etc.– Common API for machine consumption
Goal: Interoperable Set of Data Type Registries
Research Data Alliance Corporation for National Research Initiatives
• 3/2013 – 9/2013– Gathering use cases– Investigating other work in the area– First drafts of data model and functional specs for a type registry
• 10/2013 – 12/2013– Refine data model and functional specs– Deploy initial prototype
• 1/2014 – 5/2014 – Finalize data model and functional specs– Deploy functional type registry for PID types– Release turnkey registry conforming to functional specs
Schedule
Research Data Alliance Corporation for National Research Initiatives
• Broad Functional Classification– Repos hold widely varying levels of data & metadata – High-level functional classification of the identified object needed to make sense of what is
available, e.g., data object, metadata, repo description, contact info, etc.
• Simple License Information via PID Resolution– Data set access conditions cannot be predicted based on ID– For DataCite DOIs, a handle/type/value triple could be used to provide access information,
probably through a level of indirection, resulting in a pop-up or intervening page or open linked data
• Object Types as a Short-cut for Dependent Services to Match Processing Requirements to Data Objects
– Using data acquisition as an example • Determine object type you are trying to build• Consult registry to index into an ontology to dynamically define required and optional properties• Does the input data have what is needed?
• Registration of PID Types (in ID/Type/Value triples) for Data Processing and Interpretation
– Distinguish pointers to objects from pointers to metadata from pointers to services– Enable complex client interactions as opposed to simple one-to-one re-direction
DTR Use Cases
Research Data Alliance Corporation for National Research Initiatives
Users
Typed Data
ID
Type
Payload
ID
Type
Payload
ID
Type
Payload
ID
Type
Payload
ID
Type
Payload
ID
Type
Payload
Federated Set of Type Registries
1010011010101….
VisualizationI Agree
Terms:…
Rights
Services
Data ProcessingData SetDissemination
Client (process or people) encounters unknown type1
Resolved to Type Registry2
Response includes type definitions, relationships, properties, and possibly service pointers. Response can beused locally for processing, or, optionally
3
Typed data or reference to typed data can be sent to service provider4
1
23
4
4
One Use of Type Registries
Research Data Alliance Corporation for National Research Initiatives
A Few Words About CNRI
• Not-for-profit organization formed in 1986 to foster research and development for the National Information Infrastructure (now internationally focused)
• Major focus on management of information on networks: Digital Object Architecture– Handle System– DO Repository– DO Registry
Research Data Alliance Corporation for National Research Initiatives
• Research Project: Early 90s– Initial US-funded digital library project (DARPA)
• Library/Publishing: late 90s through 00s and continuing to grow– DSpace – turnkey digital library platform (MIT + HP)– Digital Object Identifier (DOI) for journal articles– International from the start, including Asia
• Breaking out of the publisher/library ghetto: starting late 00s– Scientific data
• Australian National Data Service (ANDS)• Max Planck (handles)• DataCite (DOIs)• EPIC (European Persistent Id Consortium)• EUDAT
– Entertainment Industry• EIDR (DOIs)
• Threshold of use and dependence brings governance and sustainability Issues– Who is CNRI? How long will they be around?– Who is in charge?– Not just a standards issue due to the global service (cf DNS)
Handle System Adoption by Domain
Research Data Alliance Corporation for National Research Initiatives
• Spread Responsibility and Control from One Group to Many– Involve stakeholders– Develop financial sustainability plan
• Develop an organizational model– Try to balance long-term and short-term incentives– Try to keep the organization from being captured by minority and/or moneyed interests– Build in flexibility
• Independence from individual governments or industry players• DONA Foundation
– Non-profit being established in Switzerland– Peer group of stakeholders will run and financially support the global infrastructure – Board of Directors will provide high-level guidance– CNRI will transfer relevant rights and technology to the Foundation and continue as 1/N
stakeholders– Each stakeholder has identical responsibilities to the Foundation but otherwise
independent• Governments could participate and provide their support out of general revenues• Industry could create appropriate business models
– Formation in process, near term completion– Longer range objective is Digital Object Architecture approach to information system
interoperability
Infrastructural Governance and Sustainability