vivo researcher networking update

April 5, 2011 1-2 p.m.

VIVO Researcher Networking Update

Leslie McIntoshVivo National EvaluatorWashington University

Jonathan Corson-RikertVivo Development LeadCornell University

Ellen J. CramerSpecial Projects LeadCornell University

VIVO Collaboration

Cornell UniversityDean Krafft (Cornell PI)

Manolo BeviaJim Blake

Nick CappadonaBrian CarusoElly Cramer

Medha DevareElizabeth Hines

Huda KhanBrian Lowe

Joseph McEnerneyHolly Mistlebauer

Stella MitchellAnup Sawant

Christopher WestlingTim Worrall

Rebecca YounesJon Corson-Rikert

University of FloridaMike Conlon (VIVO and UF PI)

Beth AutenChris Barnes

Cecilia BoteroKerry Britt

Erin BrooksAmy Buhler

Ellie BushhousenLinda Butson

Chris CaseChristine Cogar

Valrie DavisMary Edwards

Nita FerreeRolando Garcia-Milan

George HackChris HainesSara HenningRae Jesano

Margeaux JohnsonMeghan Latorre

Yang LiPaula Markes

Hannah NortonNarayan Raum

Alexander RockwellSara Russell Gonzalez

Nancy SchaeferDale SchepplerNicholas Skaggs

Syraj SyedMatthew Tedder

Michele R. TennantAlicia Turner

Stephen Williams

Indiana UniversityKaty Borner (IU PI)

Kavitha ChandrasekarBin Chen

Shanshan ChenRyan CobineJeni Coffey

Suresh DeivasigamaniYing Ding

Russell DuhonJon Dunn

Poornima GopinathJulie Hardesty

Brian KeeseNamrata Lele

Micah LinnemeierNianli Ma

Robert H. McDonaldAsik Pradhan Gongaju

Mark PriceMichael Stamper

Yuyin SunChintan TankAlan Walsh

Brian WheelerFeng Wu

Angela Zoss

Ponce School of MedicineRichard J. Noel, Jr. (Ponce PI)

Ricardo Espada ColonDamaris Torres Cruz

Michael Vega Negrón

This project is funded by the National Institutes of Health, U24 RR029822"VIVO: Enabling National Networking of Scientists”

The Scripps Research Institute

Gerald Joyce (Scripps PI)Catherine Dunn

Brant KelleyPaula King

Angela MurrellBarbara NobleCary Thomas

Michaeleen Trimarchi

Washington University School of Medicine in St. Louis

Rakesh Nagarajan (WUSTL PI)Kristi L. HolmesCaerie HouchinsGeorge JosephSunita B. Koul

Jasmine OwensLeslie D. McIntosh

Weill Cornell Medical CollegeCurtis Cole (Weill PI)

Paul AlbertVictor Brodsky

Mark BronnimannAdam Cheriff

Oscar CruzDan Dickinson

Richard HuChris Huang

Itay KlazKenneth Lee

Peter MicheliniGrace Migliorisi

John RuffingJason Specland

Tru TranVinay Varughese

Virgil Wong

An open-source semantic web application that enables the discovery of research and scholarship across disciplines in an institution.

Populated with detailed profiles of faculty and researchers; displaying items such as publications, teaching, service, and professional affiliations.

A powerful search functionality for locating people and information within or across institutions.

Participating InstitutionsInstitution Acad.

StaffStudent Pop.

City Pop.

Public/Private

Med School

Cornell (Ithaca) 1,639 20.9K 100K BothUniversity of Florida 4,534 50.7K 258K Public YesIndiana University (Bloomington) 2,973 42.4K 175K PublicPonce School of Medicine 200 475 442K Private YesThe Scripps Research Institute 225 ~225 43K Private

Washington University School of Medicine

1,772 ~500 2.8M Private Yes

Weill-Cornell Medical College 1,235 410 8.2M Private Yes

Lessons Learned in VIVO Implementation

Data, Data, Data

Get the Data• Who owns the data?• Where are the data

sources?• What permissions do you

need to use the data?

Manage the Data• Who owns the data now?• Do you need to create a

data management system?• How will you refresh your

data? How often?

Your data are only as good as the source.

Manage Expectations

Contribute to the Community

More to open-source than contributing code– Data– Documentation– IRC communication– Listservs– Lessons learned

vivoweb.orgvivo.sourceforge.net

http://vivoweb.org/

http://vivo.sourceforge.net/

VIVO Cornell: In-house to National Cloud

2003-2007 Development of research profiles using ontologies in a database-driven website to meet the needs of the Life Sciences initiative.

2007 Converted to Semantic Web standards. Expanded to include disciplines across the institution

2007–2011+ With NIH grant, moved to national and international network of institutions and organizations and their faculty and researcher profiles

VIVO Cornell: Data Sources

Repurposing and re-using data

Local Outreach

• Provost Office - institutional support• Data providers – HR, Annual faculty reporting,

Grants, Courses, Other• Librarian VIVO liaisons -subject areas• Web developers - repurposing of data• Department editors - training

NetworkingOther sites piloting or adopting VIVO technologyArizona State University, Duke University, IICA, Los Alamos National Laboratory, Northwestern University, Stony Brook University, University of Arkansas, University of Buffalo, University of Colorado – Boulder, University of Delaware, University of Oregon, University of Virginia, USDA

Integration partnersAPA (Digital Trust), Duke (Widgets), Harvard University (Harvard Profiles), Indiana University (HUBzero), Orchid, Stony Brook University (UMLS), University of Hong Kong (Knowledge Exchange), University of Pittsburgh (Digital Vita), Weill Cornell Medical College (Google Refine).

International efforts• ANDS-Vitro Consortium (Griffith, QUT, University of Melbourne, VeRSI)• Chinese Academy of Sciences • IICA (Inter-American Institute for Cooperation on Agriculture) isconsidering options like VIVO for a researcher network for their SIDALCApplication and there is a pilot VIVO implementation at the El Colegio de Postgraduados of Mexico.

VIVO update part III

• VIVO core design principles• Enhancements during the NIH grant• Planned development• VIVO at web scale• Mini-grants and collaborations• Building community and sustainability

First, it’s about data

• Consistent formatting, in a language of the Web• Self-describing– Ontology– Context inherent in the data

• Distributed• De-referenceable• Reusable without (or with) modification• Persistent independently of any application

VIVO is not just people or profiles

• Anything can be a type (and have individuals)• All individuals have the same structure– Varying attributes & relationships– Inheritance

• Extend the ontology without modifying the app– Tradeoffs of generality vs. optimal interface

Highlights of recent improvementsLinkedOpen Data

Application

navigation

theming

scalability

MVC structure

VIVOCore

Ontology

eagle-iresearchresources

self-editing

externalauthentication

HarvesterVisualizations

page templates

grants

HR data

Pubmed

Drupal importer

Deliverables by August, 2011LinkedOpen Data

Application

navigation

theming

scalability

MVC structure

VIVOCore

Ontology

self-editing

externalauthentication

Visualizations

page templates

Map of Science

GeoMap

role-basedauthorization

aggregatorsoftware

RDF to Solrindexer

local/national

search UI

linkingbetween

VIVOs

Search-related functionalities

Bioportalsubmission

Harvester

more pubformats

nationalgrant data

Drupal importer

“National” search

• NIH mandated no reliance on sustained centralized infrastructure

• Aggregation of RDF from multiple sources– Harvard Profiles, Collexis, and likely others

• Solr indexing leveraging the VIVO ontology• Aggregator and indexing will be configurable

to harvest any desired set of sources

National networking & search

Ponce VIVO

WashU VIVO

IU VIVO

CornellIthaca VIVO

WeillCornell VIVO

VIVOaggregatortriple store

OtherVIVOs

OtherCTSA

VIVOsHarvardProfiles

RDF

OtherVIVOs

OtherRDF

Future CTSAtriple store

Futurestate or regional

triple store

FutureCTSASolr

index

OtherRDF

Solrsearchindex

Linked Open Data

futureSolr

index

VIVOnationalnetworksearch

UF VIVO

Scripps VIVO

VIVO at web scale

• Connections directly between VIVOs– Multiple campuses of 1 institution– Multiple institutions within a consortium– Data resides & served from home institution

• Individuals linked by URI or common identifier• Updates via linked data harvesting or pingback

As the linked data cloud grows

• Search enhanced by authoritative, structured, and updated data– Retrieval and filtering by type & relationship, not just text– Enables better data mining and analysis– Reduces reporting burden

• Unique semantic advantages– Categorization implicit in defined ontologies– Common references to shared terminologies– ORCID and other initiatives leading to common

references to individuals

Community development• VIVOweb.org• VIVO on sourceforge– Fully open source (BSD license)– Subversion repository – download or check out– Active development and implementation mail lists &

forums– Installation and upgrade documentation– Wiki-based documentation effort– Supplemental materials

• Many ways to contribute and benefit

Mini-grants address key areas

• Controlled vocabularies (Stony Brook)• Author IDs and disambiguation (ORCID)• Widgets to re-use VIVO data in standard web pages

(Duke)• Direct output to biosketches and CVs (Pittsburgh)• Connection to the HUBzero scientific simulation and

grid services platform, via Joomla CMS (IU)• Google Refine for data cleanup and export (Weill

Cornell)

VIVO Ecosystem Evolution

Community collaborations

• ORCID• Connections to institutional repositories, as

other libraries implement VIVO• Library of Congress support for Exhibit API

with VIVO as one target• Dataset metadata discovery and registry work,

with Australian VIVO consortium

Questions yet to address• What access points and services need to be

provided for national (or international) research networking to succeed?– How will people be able to integrate this data into their

daily workflow and research process?– How will boundaries between public and private data

and services work?• Federating group privileges as well as identities

across multiple VIVOs and to other research-enabling tools

Thank you

vivo researcher networking update

Documents

national data

project sites

multiinstitutional context

sites leslie

institutional landscape

future development directions

challenges of linked

mindevelopment directions