datanet: infrastructure to connect data, people, and · pdf filedatanet: infrastructure to...

1
DataNet: Infrastructure to Connect Data, People, and Science Mission: Lower barriers to conducting interdisciplinary humanenvironment interactions research by making data with different formats from different scienti:ic domains easily interoperable. Data, Tools & Services Source Data Census microdata and aggregate data Land use/land cover and climate data Other population and environmental data Data Integra.on Methods to integrate diverse data using spatial location and geographic boundaries to link data contents Webbased Data Access System Explore available data and metadata Select variables of interest Merge data from different source datasets and formats Human Networks Development system tes.ng Opportunities to explore prerelease versions and provide feedback at conferences, including AGU (contact [email protected] to participate) Development Community Feedback through surveys and beta testing. Sign up at www.terrapop.org Mission: Support the long tailof research through an environment with low barriers to deposit, active and social curation, and links to existing preservation infrastructure for longterm access. Data, Tools & Services Social Networking Environment VIVO instance with researcher pro:iles, publications and data citations for discovery of expertise, publications, and data with network visualizations Ac.ve Content Repository Storage for data and metadata undergoing active use with capabilities for deposit, metadata extraction, previewing, tagging and social curation Virtual Archive Distributed storage for longterm archiving and dissemination of :inisheddata products in institutional repositories and topical archives Human Networks Ac.ve and Social Data Cura.on Tools for incorporating communitygenerated tags, annotations, assessments, and repurposing notes in metadata and for identi:ication and generation of archival data packages Science Community Networking Compiling connections among individual scientists, research teams, publications, source datasets and derived datasets and tools for traversing the network to discover related people and work Mission: Enable collaborative research through policy and standardsbased federation of existing data management infrastructure Data, Tools & Services iRODS Data Grids Sharable collections of remotelylocated datasets managed by policies that automate administrative tasks, validation, and federation Workflow Integra.on Capture processes applied to data to support documentation, repeatability, sharing, and reexecution Interoperability Mechanisms Enable access to community resources using their protocols and register remote data into collaboration environments Human Networks Collabora.on Environments Enable groups of researchers to access common datasets, work:lows, and relationships between data and work:lows Educa.onal Access to Live Data Support controlled access to collections of data allowing students to build personal reference collections and perform de:ined data management and analysis tasks Mission: Develop an institutional solution for the collection, preservation and reuse of data; encourage collaboration by enabling researchers to :ind someone elses data products and assess their potential for re use and recombination. Data, Tools & Services Data Conservancy Service and Reference UI Robust ingest framework Query interface Archival store abstraction over the Fedora Repository HTTP APIs supporting ingest, query, and retrieval of data Browserbased user interface Integra.ons with External Systems Antarctica Dry Valley Glacier Photograph Collection at National Snow and Ice Data Center (NSIDC) – Uses search and access APIs. ArXiv.org PrePrint Repository – Uses search, access and ingest APIs Human Networks DC Instances at JHU and NSIDC Technical tools and organizational services for data collection, curation, management, storage, preservation, and sharing. JHU Data Management Services – Helps researchers develop data management plans and both preserve and share research data. NSIDC – Facilitates curation of results from knowledge documentation projects in Arctic communities by the Exchange for Local Observations and Knowledge of the Arctic project Educa.on Graduate programs, training courses, webinars, and other resources on data curation and management. Mission: Enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it. Data, Tools & Services Distributed Data Network Member Nodes Existing data collections exposed through DataONE Coordinating Nodes Support indexing and replication services across member nodes Common Search and Discovery ONEMercury :inds data in in all member nodes from a single entry point Inves.gator toolkit Data Management Planning Tool Guides development of DMPs for grant proposals Data Citations ONEMercury search results are tagged for import into common bibliography management tools DataUp Best practice checks and metadata creation to prepare data in Excel for archives Human Networks DataONE UsersGroup Annual meetings and other opportunities for stakeholders to learn about and guide DataONEs development Working Groups Identify, describe, and implement DataONE cyber infrastructure, governance, and other projects Educa.on Training sessions, education models, and graduate courses relating to various aspects of data management for students and citizen scientists Institutional Repositories Network of Data Producers Web User Interface Active Content Repository Services Provided Virtual Archives User Network Data Conservancy IU ICPSR Content Mining Curation Decisions Archival data generation Other services RPI UIUC UM For more informa<on: www.dataone.org Amber Budden, Director for Community Engagement and Outreach [email protected] For more informa<on: www.dataconservancy.org Shonna Clark, Project Coordinator [email protected] For more informa<on: hOp://datafed.org Mary Whitton, Project Manager [email protected] For more informa<on: hOp://seaddata.net Marietta Van Buhler, Project Manager [email protected] For more informa<on: www.terrapop.org Tracy Kugler, Project Manager [email protected] get create replicate synchronize search Cross-DataNet Collaboration The :ive DataNet projects collaborate through monthly conference calls, inperson PI meetings, and joint projects to build interoperable cyber infrastructure and to engage with a broad network of researchers in the natural and social sciences. Interoperable CyberInfrastructure Human Networks Examples of Joint Projects Access to TerraPop extracts in DFC collaboration environments Integration of Data Conservancy DCSLite and SEAD Active Content Repository tools Projects participating in DataONE as member nodes DataNet Collabora.on Areas Semantic integration Technical best practices for sustainability Data discovery, formats, and interoperability from the scientists perspective Popula.on and environmental data in grids Environmental and popula.on summaries for spa.al units Arealevel data Rasters Microdata Individuals and households with their environmental and social context Training and educa.on – Joint development and cross program utilization of data management courses, sessions, and workshops Crossdisciplinary data awareness – Introducing scientists to data from other disciplines through cross program conference activities and other outreach Longterm financial sustainability – Identifying and implementing funding and revenue models to support longterm data preservation and access Governance – Mechanisms for gathering stakeholder feedback and decisionmaking Data Grid iRODS controlled workflows Storage Shared Collec.on Data Grid iRODS controlled workflows Researchers Client Storage Storage Storage Minnesota Population Center

Upload: nguyenquynh

Post on 13-Mar-2018

219 views

Category:

Documents


4 download

TRANSCRIPT

DataNet: Infrastructure to Connect Data, People, and Science

Mission:  Lower  barriers  to  conducting  interdisciplinary  human-­‐environment  interactions  research  by  making  data  with  different  formats  from  different  scienti:ic  domains  easily  interoperable.  

Data,  Tools  &  Services  Source  Data  

  Census  microdata  and  aggregate  data    Land  use/land  cover  and  climate  data    Other  population  and  environmental  data  

Data  Integra.on  Methods  to  integrate  diverse  data  using  spatial  location  and  geographic  boundaries  to  link  data  contents      

Web-­‐based  Data  Access  System    Explore  available  data  and  metadata    Select  variables  of  interest    Merge  data  from  different  source  datasets  and  formats  

Human  Networks  Development  system  tes.ng      Opportunities  to  explore  pre-­‐release  versions  and  provide  feedback  at  conferences,  including  AGU  (contact  [email protected]  to  participate)  

Development  Community    Feedback  through  surveys  and  beta  testing.    Sign  up  at  www.terrapop.org  

Mission:  Support  the  “long  tail”  of  research  through  an  environment  with  low  barriers  to  deposit,  active  and  social  curation,  and  links  to  existing  preservation  infrastructure  for  long-­‐term  access.  

Data,  Tools  &  Services  Social  Networking  Environment  VIVO  instance  with  researcher  pro:iles,  publications  and  data  citations  for  discovery  of  expertise,  publications,  and  data  with  network  visualizations  

Ac.ve  Content  Repository  Storage  for  data  and  metadata  undergoing  active  use  with  capabilities  for  deposit,  metadata  extraction,  previewing,    tagging  and  social  curation      

Virtual  Archive  Distributed  storage  for      long-­‐term  archiving  and    dissemination  of    ‘:inished’  data    products  in    institutional    repositories    and  topical    archives  

Human  Networks  Ac.ve  and  Social  Data  Cura.on      Tools  for  incorporating  community-­‐generated  tags,  annotations,  assessments,  and  repurposing  notes  in  metadata  and  for  identi:ication  and  generation  of  archival  data  packages  

Science  Community  Networking    Compiling  connections  among  individual  scientists,  research  teams,  publications,  source  datasets  and  derived  datasets  and  tools  for  traversing  the  network  to  discover  related  people  and  work  

Mission:  Enable  collaborative  research  through  policy-­‐  and  standards-­‐based  federation  of  existing  data  management  infrastructure  

Data,  Tools  &  Services  iRODS  Data  Grids  Sharable  collections  of  remotely-­‐located  datasets  managed  by  policies  that  automate  administrative  tasks,  validation,  and  federation  

Workflow  Integra.on  Capture  processes  applied  to  data  to  support  documentation,  repeatability,  sharing,  and  re-­‐execution      

Interoperability  Mechanisms  Enable  access  to  community  resources  using  their  protocols  and  register  remote  data  into  collaboration  environments  

Human  Networks  Collabora.on  Environments      Enable  groups  of  researchers  to  access  common  datasets,  work:lows,  and  relationships  between  data  and  work:lows  

Educa.onal  Access  to  Live  Data    Support  controlled  access  to  collections  of  data  allowing  students  to  build  personal  reference  collections  and  perform  de:ined  data  management  and  analysis  tasks  

Mission:  Develop  an  institutional  solution  for  the  collection,  preservation  and  re-­‐use  of  data;  encourage  collaboration  by  enabling  researchers  to  :ind  someone  else’s  data  products  and  assess  their  potential  for  re-­‐use  and  re-­‐combination.  

Data,  Tools  &  Services  Data  Conservancy  Service  and  Reference  UI  •  Robust  ingest  framework  •  Query  interface  •  Archival  store  abstraction  over  the  Fedora  Repository  •  HTTP  APIs  supporting  ingest,  query,  and  retrieval  of  

data  •  Browser-­‐based  user  interface  Integra.ons  with  External  Systems  •  Antarctica  Dry  Valley  Glacier  Photograph  Collection  at  

National  Snow  and  Ice  Data  Center  (NSIDC)  –  Uses  search  and  access  APIs.  

•  ArXiv.org  Pre-­‐Print  Repository  –  Uses  search,  access  and  ingest  APIs  

Human  Networks  DC  Instances  at  JHU  and  NSIDC      •  Technical  tools  and  organizational  services  for  data  

collection,  curation,  management,  storage,  preservation,  and  sharing.      

•  JHU  Data  Management  Services  –  Helps  researchers  develop  data  management  plans  and  both  preserve  and  share  research  data.  

•  NSIDC  –  Facilitates  curation  of  results  from  knowledge  documentation  projects  in  Arctic  communities  by  the  Exchange  for  Local  Observations  and  Knowledge  of  the  Arctic  project  

Educa.on    Graduate  programs,  training  courses,  webinars,  and  other  resources  on  data  curation  and  management.  

Mission:  Enable  new  science  and  knowledge  creation  through  universal  access  to  data  about  life  on  earth  and  the  environment  that  sustains  it.  

Data,  Tools  &  Services  Distributed  Data  Network  •  Member  Nodes  –  Existing  data  collections  exposed  

through  DataONE  •  Coordinating  Nodes  –  Support  indexing  and  

replication  services  across  member  nodes  •  Common  Search  and  Discovery  –  ONEMercury  :inds  

data  in  in  all  member  nodes  from  a  single  entry  point  Inves.gator  toolkit  •  Data  Management  Planning  Tool  –  Guides  

development  of  DMPs  for  grant  proposals  •  Data  Citations  –ONEMercury  search  results  are  tagged  

for  import  into  common  bibliography  management  tools  

•  DataUp  –  Best  practice  checks  and  metadata  creation  to  prepare  data  in  Excel  for  archives  

Human  Networks  DataONE  Users’  Group      Annual  meetings  and  other  opportunities  for  stakeholders  to  learn  about  and  guide  DataONE’s  development  

Working  Groups  Identify,  describe,  and  implement  DataONE  cyber-­‐infrastructure,  governance,  and  other  projects  

Educa.on  Training  sessions,  education  models,  and  graduate  courses  relating  to  various  aspects  of  data  management  for  students  and  citizen  scientists  

Institutional  Repositories

Network  of  Data  Producers

Web  User  Interface

Active  Content  Repository

Services  Provided

Virtual  Archives

User  Network

Data  Conservancy

IU ICPSR

Content  Mining

Curation  Decisions

Archival  data  

generation

Other  services

RPI UIUC UM

For  more  informa<on:  www.dataone.org  Amber  Budden,  Director  for  Community  Engagement  and  Outreach  [email protected]    

For  more  informa<on:  www.dataconservancy.org  Shonna  Clark,  Project  Coordinator  [email protected]      

For  more  informa<on:  hOp://datafed.org  Mary  Whitton,  Project  Manager  [email protected]    

For  more  informa<on:  hOp://sead-­‐data.net  Marietta  Van  Buhler,  Project  Manager  [email protected]    

For  more  informa<on:  www.terrapop.org  Tracy  Kugler,  Project  Manager  [email protected]    

get  

create  

replicate  

synchronize  

search  

Cross-DataNet Collaboration

The  :ive  DataNet  projects  collaborate  through  monthly  conference  calls,  in-­‐person  PI  meetings,  and  

joint  projects  to  build  interoperable  cyber-­‐infrastructure  and  to  engage  with  a  broad  network  of  

researchers  in  the  natural  and  social  sciences.  

Interoperable  Cyber-­‐Infrastructure   Human  Networks  Examples  of  Joint  Projects  •  Access  to  TerraPop  extracts  in  DFC  collaboration  environments  •  Integration  of  Data  Conservancy  DCS-­‐Lite  and  SEAD  Active  

Content  Repository  tools  •  Projects  participating  in  DataONE  as  member  nodes  

DataNet  Collabora.on  Areas  •  Semantic  integration  •  Technical  best  practices  for  sustainability  •  Data  discovery,  formats,  and  interoperability  

from  the  scientist’s  perspective  

Popula.on  and  environmental  data  in  grids    

Environmental    and  popula.on  summaries  for  spa.al  units  

Area-­‐level  data  

Rasters  

Microdata  

Individuals  and  households    with  their  environmental    and    social    context  

•  Training  and  educa.on  –  Joint  development  and  cross-­‐program  utilization  of  data  management  courses,  sessions,  and  workshops  

•  Cross-­‐disciplinary  data  awareness  –  Introducing  scientists  to  data  from  other  disciplines  through  cross-­‐program  conference  activities  and  other  outreach  

•  Long-­‐term  financial  sustainability  –  Identifying  and  implementing  funding  and  revenue  models  to  support  long-­‐term  data  preservation  and  access  

•  Governance  –  Mechanisms  for  gathering  stakeholder  feedback  and  decision-­‐making  

Data  Grid  iRODS    

controlled  workflows  

Storage  

Shared  Collec.on  

Data  Grid  iRODS    

controlled  workflows  

Researchers  -­‐  Client  

Storage   Storage   Storage  

Minnesota Population Center