komatsoulis internet2 executive track

20
George A. Komatsoulis, Ph.D. National Center for Biotechnology Information National Library of Medicine National Institutes of Health U.S. Department of Health and Human Services NIH Perspective Executive Track internet2 Global Forum 2015

Upload: george-komatsoulis

Post on 20-Jan-2017

106 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Komatsoulis internet2 executive track

George A. Komatsoulis, Ph.D.National Center for Biotechnology Information

National Library of MedicineNational Institutes of Health

U.S. Department of Health and Human Services

NIH PerspectiveExecutive Track

internet2 Global Forum 2015

Page 2: Komatsoulis internet2 executive track

The Commons Business Model

Page 3: Komatsoulis internet2 executive track

The

Com

mon

sDigital Objects

(with identifiers)

Search(Indexed Metadata and API)

Computing Platform

The Commons: Conceptual Framework

Ope

n AP

Is

Softw

are

Enca

psul

ation

Page 4: Komatsoulis internet2 executive track

The

Com

mon

s

Digital Objects(with identifiers)

Search(Indexed Metadata and API)

Computing Platform

CommonsFederation

(Infrastructure)

BD2K Centers

DDICC(Search)

ExistingResources

Indexes Methods

Content

Page 5: Komatsoulis internet2 executive track

CommonsFederation

(Infrastructure)

BD2K Centers

DDICC(Search)

ExistingResources

Indexes Methods

Content

Investigator

Works In

Searches

Page 6: Komatsoulis internet2 executive track

CommonsFederation

(Infrastructure)

Conformant ProviderA

Conformant ProviderB

Conformant ProviderC

Page 7: Komatsoulis internet2 executive track

The Commons: Business Model

Researcher

Discovery IndexThe Commons

Cloud ProviderC

Cloud ProviderB

Cloud ProviderA

NIH

Provides Digital Objects

Retrieves/Uses Digital Objects

Option: Fund Providers to

Support NIH Directed Resources

Indexes Commons

ProvideCredits

UsesCredits

FindsObjects

Commons Implemented as a federation of ‘conformant’ cloud providers and HPC environments

Funded primarily by providing credits to investigators

Page 8: Komatsoulis internet2 executive track

Cost effective - Only pay for IT support usedDrives competition – Better services at lower

costSupports Data sharing by driving science into

the CommonsFacilitates public-private partnershipScalable to most categories of data expected in

the next 5 years.

Potential Advantages of this Model

Page 9: Komatsoulis internet2 executive track

Novelty: Never been tried, so we don’t have data about likelihood of success

Cost Models: Predicated on stable or declining prices among providers True for the last several years, but we can’t guarantee that it will

continue, particularly if there is significant consolidation in industry Service Providers:

Predicated on service providers willing to make the investment to become conformant

Market research suggests 3-5 providers within 2-3 months of program launch

Persistence: The model is ‘Pay As You Go’ which means if you stop paying it stops going Giving investigators an unprecedented level of control over what lives (or

dies) in the Commons

Potential Disadvantages of this Model

Page 10: Komatsoulis internet2 executive track

What does it mean for a vendor to be conformant?Minimum set of requirements for

Business relationships (reseller, investigators)Interfaces (upload, download, manage, compute)Capacity (storage, compute)Networking and ConnectivityInformation AssuranceAuthentication and authorization

Likely to be reviewed self-certification in pilot phaseA conformant cloud ≠ an IaaS provider

Page 11: Komatsoulis internet2 executive track

Likely to evolve into multiple ‘Levels of Compliance’ corresponding to increasing degrees of making data/software meet ‘FAIR’ criteria.

Some of our current thinking for basic compliance Objects are physically or logically available in the Commons Objects are indexed with a usable identifier Objects have basic search metadata attached to index entries Objects have clear access rules Objects have basic semantic metadata available

Higher levels could include Objects indexed with standards based identifiers (ORCID, doi, etc.) Objects are open to the public (or as open as reasonable given data type) Objects conform to agreed upon standards (CDISC, DICOM, etc.) Data objects are accessible via standard APIs Software is encapsulated (containers, other technology) for easier usage

We want and need your feedback on these matters!

What it mean for a scientist to be compliant?

Page 12: Komatsoulis internet2 executive track

Phase 0: Build the plumbingPhase 1: Pilot the model on a small number of

investigators experienced with cloud computing, probably within the context of BD2K awards

Phase 2: Open the Commons credit process to grantees from a subset of NIH Institutes and Centers

Phase 3: Open the process to all NIH grantees

Pilot of the Commons Business Model

Page 13: Komatsoulis internet2 executive track

dbGaP Cloud Policy

Page 14: Komatsoulis internet2 executive track

dbGaP: A Database of Genotypes and Phenotypes

Page 15: Komatsoulis internet2 executive track

Approved March 23, 2015“In light of the advances made in security protocols for cloud

computing in the past several years and given the expansion in the volume and complexity of genomic data generated by the research community, the National Institutes of Health (NIH) is now allowing investigators to request permission to transfer controlled-access genomic and associated phenotypic data obtained from NIH-designated data repositories under the auspices of the NIH Genomic Data Sharing (GDS) Policy to public or private cloud systems for data storage and analysis.”

Responsibility for ensuring the security and integrity remains with the institution.

NIH Position Statement on the use of cloud computing services

Page 16: Komatsoulis internet2 executive track

What can a CIO do to support biomedical research on their campus?

Page 17: Komatsoulis internet2 executive track

Help maintain perspective

1960 1970 1980 1990 2000 2010 2020

Page 18: Komatsoulis internet2 executive track

Connect us with our colleagues in other disciplinesSensor Stream = 500 EB/dayStores 69 TB/day

Collection = 14 EB/dayStore 1PB/day

Total Data = 14 PBStore an average of 3.3TB/day for 10 years!

Page 19: Komatsoulis internet2 executive track

But don’t lose sight of the differences associated with biological research

Page 20: Komatsoulis internet2 executive track

NIH Office of ADDSVivien Bonazzi, Ph.D.Philip Bourne, Ph.DMichelle Dunn, Ph.DMark Guyer, Ph.D.Jennie Larkin, Ph.D.Leigh FinneganBeth Russell

NCBIDennis Benson, Ph.D.Alan GraeffDavid Lipman, MDJim Ostell, Ph.D.Don PreussSteve Sherry

Acknowledgements