complexity must become linear or decrease smart data infrastructure: the sixth generation of...

1
Complexity must become Linear or Decrease Smart data infrastructure: The sixth generation of mediation for data science Peter Fox 1 ([email protected] ) ( 1 Rensselaer Polytechnic Institute 110 8 th St., Troy, NY, 12180 United States – see Acknowledgements) Glossary: RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute S2S – S2S (!) SESF – Semantic eScience Framework BCO-DMO – Biological and Chemical Oceanography Data Management Office Acknowledgments: SeSF Project Team: Eric Rozell, Han Wang, Jin Zheng, Patrick West, Stephan Zednik, Jim Hendler, Deborah McGuinness BCO-DMO Staff: Cyndy Chandler, Adam Shephard, Bob Groman Sponsors: National Science Foundation Tetherless World Constellation MOTIVATION In the emergent “fourth paradigm” (data-driven) science, the scientific method is enhanced by the integration of significant data sources into the practice of scientific research. To address Big Science, there are challenges in understanding the role of data in enabling researchers to attack not just disciplinary issues, but also the system-level, large-scale, and transdisciplinary global scientific challenges facing society. Recognizing that the volume of data is only one of many dimensions to be considered, there is a clear need for improved data infrastructures to mediate data and information exchange, which we contend will need to be powered by semantic technologies. One clear need is to provide computational approaches for researchers to discover appropriate data resources, rapidly integrate data collections from heterogeneously resources or multiple data sets, and inter-compare results to allow generation and validation of hypotheses. Another trend is toward automated tools that allow researchers to better find and reuse data Semantic eScience Framework Five Generations of Mediation – Borgman et al. (2008) CyberLearning Report Cognitive Computing Realizing the 6 th Generation and the Integration of the Other 5! Schematic of a Cognitive Computing Archeitecture (courtesy Jim Hendler) Smart data agents are part of the next generation of computing infrastructure mediating research These agents are a fundamental part of the new cognitive computing platforms being developed Open-world (versus Closed-world) is essential Linked data will be a fundamental enabler Smart applications! AGUFM14 – IN23C-3737 (MS Hall A-C) Framework and relation to external sources Needed evolution of cognitive systems where humans, many humans are in the loop – bringing generations 1, 2 and 3 together with generations 3, 4, 5 and now 6. All these generations of mediation are in effect as we conduct research!! NOTE: INCREASING COMPLEXITY Smart agents. Open world, semantic agents, with rules… it is notable that these capabilities NEVER made it into the top row of capabilities… in main figure. Data agents. Ones that can find data for you, and perhaps even convert it to the right format, Illustration by Roy Pea and Jillian C. Wallis, from C. L. Borgman, H. Abelson, L. Dirks, R. Johnson, K. R. Koedinger, M. C. Linn, C. A. Lynch, D. G. Oblinger, R. D. Pea, K. Salen, M. S. Smith, and A. Szalay, “Fostering Learning in the Networked World: The Cyberlearning Opportunity and Challenge. A 21st Century Agenda for the National Science Foundation. Report of the NSF Task Force on Cyberlearning,” Office of Cyberinfrastructure and Directorate for Education and Human Resources. National Science Foundation, Washington, D.C., 2008. Mediation Example: S2S Framework and Application Application Integration Smart Faceted Browse Dashboards Linked vocabularies (not Brokering) Smart Text Agents Smart Data Agents Relationship and Assoication Rules Cognitive Collaboration Linked Vocabulary S2S Application Ontology Figure: While initially developed for CyberLearning, these mediation modes apply more ge (courtesy Jim Hendler) Use of Linked vocabularie s for categories of variables, instruments , and other terms enables discovery. Memory Reasoning Decision Making Watson, Cogito, and Clarion

Upload: philippa-wilkins

Post on 16-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Complexity must become Linear or Decrease Smart data infrastructure: The sixth generation of mediation for data science Peter Fox 1 (pfox@cs.rpi.edu)pfox@cs.rpi.edu

Complexity must becomeLinear or Decrease

Smart data infrastructure: The sixth generation of mediation for data science

Peter Fox1 ([email protected])

(1Rensselaer Polytechnic Institute 110 8th St., Troy, NY, 12180 United States – see Acknowledgements)

Glossary:RPI – Rensselaer Polytechnic InstituteTWC – Tetherless World Constellation at Rensselaer Polytechnic InstituteS2S – S2S (!)SESF – Semantic eScience FrameworkBCO-DMO – Biological and Chemical Oceanography Data Management Office

Acknowledgments:SeSF Project Team: Eric Rozell, Han Wang, Jin Zheng, Patrick West, Stephan Zednik, Jim Hendler, Deborah McGuinnessBCO-DMO Staff: Cyndy Chandler, Adam Shephard, Bob Groman

Sponsors:National Science FoundationTetherless World Constellation

MOTIVATION

In the emergent “fourth paradigm” (data-driven) science, the scientific method is enhanced by the integration of significant data sources into the practice of scientific research.

To address Big Science, there are challenges in understanding the role of data in enabling researchers to attack not just disciplinary issues, but also the system-level, large-scale, and transdisciplinary global scientific challenges facing society.

Recognizing that the volume of data is only one of many dimensions to be considered, there is a clear need for improved data infrastructures to mediate data and information exchange, which we contend will need to be powered by semantic technologies.

One clear need is to provide computational approaches for researchers to discover appropriate data resources, rapidly integrate data collections from heterogeneously resources or multiple data sets, and inter-compare results to allow generation and validation of hypotheses.

Another trend is toward automated tools that allow researchers to better find and reuse data that they currently don’t know they need, let alone know how to find. Again semantic technologies will be required.

Finally, to turn data analytics from "art to science", technical solutions are needed for cross-dataset validation, reproducibility studies on data-driven results, and the concomitant citation of data products allowing recognition for those who curate and share important data resources.

Semantic eScience Framework

Five Generations of Mediation – Borgman et al. (2008) CyberLearning Report

Cognitive Computing

Realizing the 6th Generation and the Integration of the Other 5!

Schematic of a Cognitive Computing

Archeitecture (courtesy Jim Hendler)

Smart data agents are part of the next

generation of computing infrastructure

mediating research These agents are a fundamental part of the

new cognitive computing platforms being

developed Open-world (versus Closed-world) is essential Linked data will be a fundamental enabler Smart applications!

AGUFM14 – IN23C-3737 (MS Hall A-C)

Framework and relation to external sources

Needed evolution of

cognitive systems

where humans, many

humans are in the

loop – bringing

generations 1, 2 and

3 together with

generations 3, 4, 5

and now 6.

All these generations of mediation are in effect as we conduct research!!

NOTE: INCREASING COMPLEXITY

Smart agents. Open world, semantic agents, with rules… it is notable that these capabilities NEVER made it into the top row of capabilities… in main figure.

Data agents. Ones that can find data for you, and perhaps even convert it to the right format, find contextual information, etc.

Illustration by Roy Pea and Jillian C. Wallis, from C. L. Borgman, H. Abelson, L. Dirks, R. Johnson, K. R. Koedinger, M. C. Linn, C. A. Lynch, D. G. Oblinger, R. D. Pea, K. Salen, M. S. Smith, and A. Szalay, “Fostering Learning in the Networked World: The Cyberlearning Opportunity and Challenge. A 21st Century Agenda for the National Science Foundation. Report of the NSF Task Force on Cyberlearning,” Office of Cyberinfrastructure and Directorate for Education and Human Resources. National Science Foundation, Washington, D.C., 2008.

Mediation Example: S2S Framework and Application

Application Integration

Smart Faceted Browse Dashboards

Linked vocabularies (not Brokering)

Smart Text AgentsSmart Data AgentsRelationship and Assoication RulesCognitive Collaboration

Linked Vocabulary

S2S Application Ontology Figure: While initially developed for CyberLearning, these mediation modes apply more generally

(courtesy Jim Hendler)

Use of Linked vocabularies for

categories of variables,

instruments, and other terms

enables discovery.

MemoryReasoning

Decision Making

Watson, Cogito, and Clarion