linkitup: link discovery for research data
DESCRIPTION
Linkitup is a Web-based dashboard for enrichment of research output published via industry grade data repository services. It takes metadata entered through Figshare.com and tries to find equivalent terms, categories, persons or entities on the Linked Data cloud and several Web 2.0 services. It extracts references from publications, and tries to find the corresponding Digital Object Identifier (DOI). Linkitup feeds the enriched metadata back as links to the original article in the repository, but also builds a RDF representation of the metadata that can be downloaded separately, or published as research output in its own right. In this paper, we compare Linkitup to the standard workflow of publishing linked data, and show that it significantly lowers the threshold for publishing linked research data.TRANSCRIPT
linkitup Link Discovery for Research Data
Rinke Hoekstra★ and Paul GrothNetwork Insitute, VU University Amsterdam★
Law Faculty, University of Amsterdam
Linkitup - Link Discovery for Research Data by Rinke HoekstraLicensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
to2Data Semantics
Semantics for Scientific Data PublishersFrom Data
linkitup Link Discovery for Research Data
Rinke Hoekstra★ and Paul GrothNetwork Insitute, VU University Amsterdam★
Law Faculty, University of Amsterdam
Linkitup - Link Discovery for Research Data by Rinke HoekstraLicensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
to2Data Semantics
Semantics for Scientific Data PublishersFrom Data
How to share, publish, access, analyse, interpret and reuse data?
10101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010
DATA
10101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010
DATA.. the fallacies (Kayur Patel)
DATASilver Bullet?
DATASilver Bullet?
http://on.wsj.com/XCajtB
DATASilver Bullet?
http://on.wsj.com/XCajtB
DATASilver Bullet?
http://on.wsj.com/XCajtB
Data’s shameful neglectResearch cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.
More and more often these days, a research project’s success is measured not just by the publications it produces, but also by the data it makes available to the wider community. Pioneer-
ing archives such as GenBank have demonstrated just how powerful such legacy data sets can be for generating new discoveries — espe-cially when data are combined from many laboratories and analysed in ways that the original researchers could not have anticipated.
All but a handful of disciplines still lack the technical, institutional and cultural frameworks required to support such open data access (see pages 168 and 171) — leading to a scandalous shortfall in the sharing of data by researchers (see page 160). This deficiency urgently needs to be addressed by funders, universities and the researchers themselves.
Research funding agencies need to recognize that preservation of and access to digital data are central to their mission, and need to be supported accordingly. Organizations in the United Kingdom, for instance, have made a good start. The Joint Information Systems Committee, established by the seven UK research councils in 1993, has made data-sharing a priority, and has helped to establish a Digital Curation Centre, headquartered at the University of Edinburgh, to be a national focus for research and development into data issues. Other European agencies have also pursued initiatives.
The United States, by contrast, is playing catch-up. Since 2005, a 29-member Interagency Working Group on Digital Data has been trying to get US funding agencies to develop plans for how they will support data archiving — and just as importantly, to develop policies on what data should and should not be preserved, and what excep-tions should be made for reasons such as patient privacy. Some agen-cies have taken the lead in doing so; many more are hanging back. They should all being moving forwards vigorously.
What is more, funding agencies and researchers alike must ensure that they support not only the hardware needed to store the data, but
also the software that will help investigators to do this. One impor-tant facet is metadata management software: tools that streamline the tedious process of annotating data with a description of what the bits mean, which instrument collected them, which algorithms have been used to process them and so on — information that is essential if other scientists are to reuse the data effectively.
Also necessary, especially in an era when data can be mixed and combined in unanticipated ways, is software that can keep track of which pieces of data came from whom. Such systems are essential if tenure and promotion committees are ever to give credit — as they should — to candidates’ track-record of data contribution.
Who should host these data? Agencies and the research community together need to create the digital equivalent of libraries: institutions that can take responsibility for preserving digital data and making them accessible over the long term. The university research libraries themselves are obvious candidates to assume this role. But whoever takes it on, data preservation will require robust, long-term funding. One potentially helpful initiative is the US National Science Foundation’s DataNet programme, in which researchers are exploring financial mecha-nisms such as subscription services and membership fees.
Finally, universities and individual disciplines need to undertake a vigorous programme of education and outreach about data. Consider, for example, that most university science students get a reasonably good grounding in statistics. But their studies rarely include anything about information management — a discipline that encompasses the entire life cycle of data, from how they are acquired and stored to how they are organized, retrieved and maintained over time. That needs to change: data management should be woven into every course in science, as one of the foundations of knowledge. ■
A step too far?The Obama administration must fund human space flight adequately, or stop speaking of ‘exploration’.
After the space shuttle Columbia burned up during re-entry into Earth’s atmosphere in 2003, the board that was convened to investigate the disaster looked beyond its technical causes
to NASA’s organizational malaise. For decades, the board pointed out, the shuttle programme had been trying to do too much with too little money . NASA desperately needed a clearer vision and a better-defined mission for human space flight.
The next year, then-President George W. Bush attempted to supply that vision with a new long-term goal: first send astronauts to build
a base on the Moon, then send them to Mars. This idea immediately set off a debate that is still continuing, in which sceptics ask whether there is any point in returning to the Moon nearly half a century after the first landings. Why not go to Mars directly, or visit near-Earth asteroids, or send people to service telescopes in the deep space beyond Earth?
Yet that debate is both counter-productive — a new set of rockets could go to all of these places — and moot, because Bush’s vision never attracted the hoped-for budget increases. Indeed, a blue-riband commission reporting to US President Barack Obama this week (see page 153) finds the organizational malaise unchanged: NASA is still doing too much with too little . Without more money, the agency won’t be sending people anywhere beyond the International Space Station, which resides in low Earth orbit only 350 kilometres up. And even the ability to do that is in question: Ares I, the US rocket that would return
“Data management should be woven into every course in science.”
145
www.nature.com/nature Vol 461 | Issue no. 7261 | 10 September 2009
145-146 Editorials WF IF.indd 145145-146 Editorials WF IF.indd 145 8/9/09 14:06:408/9/09 14:06:40
Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.
DATASilver Bullet?
http://on.wsj.com/XCajtB
Data’s shameful neglectResearch cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.
More and more often these days, a research project’s success is measured not just by the publications it produces, but also by the data it makes available to the wider community. Pioneer-
ing archives such as GenBank have demonstrated just how powerful such legacy data sets can be for generating new discoveries — espe-cially when data are combined from many laboratories and analysed in ways that the original researchers could not have anticipated.
All but a handful of disciplines still lack the technical, institutional and cultural frameworks required to support such open data access (see pages 168 and 171) — leading to a scandalous shortfall in the sharing of data by researchers (see page 160). This deficiency urgently needs to be addressed by funders, universities and the researchers themselves.
Research funding agencies need to recognize that preservation of and access to digital data are central to their mission, and need to be supported accordingly. Organizations in the United Kingdom, for instance, have made a good start. The Joint Information Systems Committee, established by the seven UK research councils in 1993, has made data-sharing a priority, and has helped to establish a Digital Curation Centre, headquartered at the University of Edinburgh, to be a national focus for research and development into data issues. Other European agencies have also pursued initiatives.
The United States, by contrast, is playing catch-up. Since 2005, a 29-member Interagency Working Group on Digital Data has been trying to get US funding agencies to develop plans for how they will support data archiving — and just as importantly, to develop policies on what data should and should not be preserved, and what excep-tions should be made for reasons such as patient privacy. Some agen-cies have taken the lead in doing so; many more are hanging back. They should all being moving forwards vigorously.
What is more, funding agencies and researchers alike must ensure that they support not only the hardware needed to store the data, but
also the software that will help investigators to do this. One impor-tant facet is metadata management software: tools that streamline the tedious process of annotating data with a description of what the bits mean, which instrument collected them, which algorithms have been used to process them and so on — information that is essential if other scientists are to reuse the data effectively.
Also necessary, especially in an era when data can be mixed and combined in unanticipated ways, is software that can keep track of which pieces of data came from whom. Such systems are essential if tenure and promotion committees are ever to give credit — as they should — to candidates’ track-record of data contribution.
Who should host these data? Agencies and the research community together need to create the digital equivalent of libraries: institutions that can take responsibility for preserving digital data and making them accessible over the long term. The university research libraries themselves are obvious candidates to assume this role. But whoever takes it on, data preservation will require robust, long-term funding. One potentially helpful initiative is the US National Science Foundation’s DataNet programme, in which researchers are exploring financial mecha-nisms such as subscription services and membership fees.
Finally, universities and individual disciplines need to undertake a vigorous programme of education and outreach about data. Consider, for example, that most university science students get a reasonably good grounding in statistics. But their studies rarely include anything about information management — a discipline that encompasses the entire life cycle of data, from how they are acquired and stored to how they are organized, retrieved and maintained over time. That needs to change: data management should be woven into every course in science, as one of the foundations of knowledge. ■
A step too far?The Obama administration must fund human space flight adequately, or stop speaking of ‘exploration’.
After the space shuttle Columbia burned up during re-entry into Earth’s atmosphere in 2003, the board that was convened to investigate the disaster looked beyond its technical causes
to NASA’s organizational malaise. For decades, the board pointed out, the shuttle programme had been trying to do too much with too little money . NASA desperately needed a clearer vision and a better-defined mission for human space flight.
The next year, then-President George W. Bush attempted to supply that vision with a new long-term goal: first send astronauts to build
a base on the Moon, then send them to Mars. This idea immediately set off a debate that is still continuing, in which sceptics ask whether there is any point in returning to the Moon nearly half a century after the first landings. Why not go to Mars directly, or visit near-Earth asteroids, or send people to service telescopes in the deep space beyond Earth?
Yet that debate is both counter-productive — a new set of rockets could go to all of these places — and moot, because Bush’s vision never attracted the hoped-for budget increases. Indeed, a blue-riband commission reporting to US President Barack Obama this week (see page 153) finds the organizational malaise unchanged: NASA is still doing too much with too little . Without more money, the agency won’t be sending people anywhere beyond the International Space Station, which resides in low Earth orbit only 350 kilometres up. And even the ability to do that is in question: Ares I, the US rocket that would return
“Data management should be woven into every course in science.”
145
www.nature.com/nature Vol 461 | Issue no. 7261 | 10 September 2009
145-146 Editorials WF IF.indd 145145-146 Editorials WF IF.indd 145 8/9/09 14:06:408/9/09 14:06:40
Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.
DATASilver Bullet?
http://on.wsj.com/XCajtB
Data’s shameful neglectResearch cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.
More and more often these days, a research project’s success is measured not just by the publications it produces, but also by the data it makes available to the wider community. Pioneer-
ing archives such as GenBank have demonstrated just how powerful such legacy data sets can be for generating new discoveries — espe-cially when data are combined from many laboratories and analysed in ways that the original researchers could not have anticipated.
All but a handful of disciplines still lack the technical, institutional and cultural frameworks required to support such open data access (see pages 168 and 171) — leading to a scandalous shortfall in the sharing of data by researchers (see page 160). This deficiency urgently needs to be addressed by funders, universities and the researchers themselves.
Research funding agencies need to recognize that preservation of and access to digital data are central to their mission, and need to be supported accordingly. Organizations in the United Kingdom, for instance, have made a good start. The Joint Information Systems Committee, established by the seven UK research councils in 1993, has made data-sharing a priority, and has helped to establish a Digital Curation Centre, headquartered at the University of Edinburgh, to be a national focus for research and development into data issues. Other European agencies have also pursued initiatives.
The United States, by contrast, is playing catch-up. Since 2005, a 29-member Interagency Working Group on Digital Data has been trying to get US funding agencies to develop plans for how they will support data archiving — and just as importantly, to develop policies on what data should and should not be preserved, and what excep-tions should be made for reasons such as patient privacy. Some agen-cies have taken the lead in doing so; many more are hanging back. They should all being moving forwards vigorously.
What is more, funding agencies and researchers alike must ensure that they support not only the hardware needed to store the data, but
also the software that will help investigators to do this. One impor-tant facet is metadata management software: tools that streamline the tedious process of annotating data with a description of what the bits mean, which instrument collected them, which algorithms have been used to process them and so on — information that is essential if other scientists are to reuse the data effectively.
Also necessary, especially in an era when data can be mixed and combined in unanticipated ways, is software that can keep track of which pieces of data came from whom. Such systems are essential if tenure and promotion committees are ever to give credit — as they should — to candidates’ track-record of data contribution.
Who should host these data? Agencies and the research community together need to create the digital equivalent of libraries: institutions that can take responsibility for preserving digital data and making them accessible over the long term. The university research libraries themselves are obvious candidates to assume this role. But whoever takes it on, data preservation will require robust, long-term funding. One potentially helpful initiative is the US National Science Foundation’s DataNet programme, in which researchers are exploring financial mecha-nisms such as subscription services and membership fees.
Finally, universities and individual disciplines need to undertake a vigorous programme of education and outreach about data. Consider, for example, that most university science students get a reasonably good grounding in statistics. But their studies rarely include anything about information management — a discipline that encompasses the entire life cycle of data, from how they are acquired and stored to how they are organized, retrieved and maintained over time. That needs to change: data management should be woven into every course in science, as one of the foundations of knowledge. ■
A step too far?The Obama administration must fund human space flight adequately, or stop speaking of ‘exploration’.
After the space shuttle Columbia burned up during re-entry into Earth’s atmosphere in 2003, the board that was convened to investigate the disaster looked beyond its technical causes
to NASA’s organizational malaise. For decades, the board pointed out, the shuttle programme had been trying to do too much with too little money . NASA desperately needed a clearer vision and a better-defined mission for human space flight.
The next year, then-President George W. Bush attempted to supply that vision with a new long-term goal: first send astronauts to build
a base on the Moon, then send them to Mars. This idea immediately set off a debate that is still continuing, in which sceptics ask whether there is any point in returning to the Moon nearly half a century after the first landings. Why not go to Mars directly, or visit near-Earth asteroids, or send people to service telescopes in the deep space beyond Earth?
Yet that debate is both counter-productive — a new set of rockets could go to all of these places — and moot, because Bush’s vision never attracted the hoped-for budget increases. Indeed, a blue-riband commission reporting to US President Barack Obama this week (see page 153) finds the organizational malaise unchanged: NASA is still doing too much with too little . Without more money, the agency won’t be sending people anywhere beyond the International Space Station, which resides in low Earth orbit only 350 kilometres up. And even the ability to do that is in question: Ares I, the US rocket that would return
“Data management should be woven into every course in science.”
145
www.nature.com/nature Vol 461 | Issue no. 7261 | 10 September 2009
145-146 Editorials WF IF.indd 145145-146 Editorials WF IF.indd 145 8/9/09 14:06:408/9/09 14:06:40
Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.
Repository Services• Data is easy to upload
• Landing page for data
• Citable reference for data
• Default licensing options
• Guarantees for long term archival
Standard Metadata• Provenance metadata
authors, title, publication date
• Content metadata free text tags, categories, links
• Metadata is locked in
• Hard to interpret the data itself
Data is the BottleneckCommon Motifs in Scientific Workflows:
An Empirical AnalysisDaniel Garijo⇤, Pinar Alper †, Khalid Belhajjame†, Oscar Corcho⇤, Yolanda Gil‡, Carole Goble†
⇤Ontology Engineering Group, Universidad Politecnica de Madrid. {dgarijo, ocorcho}@fi.upm.es†School of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk
‡Information Sciences Institute, Department of Computer Science, University of Southern California. [email protected]
Abstract—While workflow technology has gained momentumin the last decade as a means for specifying and enacting compu-tational experiments in modern science, reusing and repurposingexisting workflows to build new scientific experiments is still adaunting task. This is partly due to the difficulty that scientistsexperience when attempting to understand existing workflows,which contain several data preparation and adaptation steps inaddition to the scientifically significant analysis steps. One wayto tackle the understandability problem is through providingabstractions that give a high-level view of activities undertakenwithin workflows. As a first step towards abstractions, we reportin this paper on the results of a manual analysis performed overa set of real-world scientific workflows from Taverna and Wingssystems. Our analysis has resulted in a set of scientific workflow
motifs that outline i) the kinds of data intensive activities that areobserved in workflows (data oriented motifs), and ii) the differentmanners in which activities are implemented within workflows(workflow oriented motifs). These motifs can be useful to informworkflow designers on the good and bad practices for workflowdevelopment, to inform the design of automated tools for thegeneration of workflow abstractions, etc.
I. INTRODUCTION
Scientific workflows have been increasingly used in the lastdecade as an instrument for data intensive scientific analysis.In these settings, workflows serve a dual function: first asdetailed documentation of the method (i. e. the input sourcesand processing steps taken for the derivation of a certaindata item) and second as re-usable, executable artifacts fordata-intensive analysis. Workflows stitch together a varietyof data manipulation activities such as data movement, datatransformation or data visualization to serve the goals of thescientific study. The stitching is realized by the constructsmade available by the workflow system used and is largelyshaped by the environment in which the system operates andthe function undertaken by the workflow.
A variety of workflow systems are in use [10] [3] [7] [2]serving several scientific disciplines. A workflow is a softwareartifact, and as such once developed and tested, it can beshared and exchanged between scientists. Other scientists canthen reuse existing workflows in their experiments, e.g., assub-workflows [17]. Workflow reuse presents several advan-tages [4]. For example, it enables proper data citation andimproves quality through shared workflow development byleveraging the expertise of previous users. Users can alsore-purpose existing workflows to adapt them to their needs[4]. Emerging workflow repositories such as myExperiment
[14] and CrowdLabs [8] have made publishing and findingworkflows easier, but scientists still face the challenges of re-use, which amounts to fully understanding and exploiting theavailable workflows/fragments. One difficulty in understandingworkflows is their complex nature. A workflow may containseveral scientifically-significant analysis steps, combined withvarious other data preparation activities, and in differentimplementation styles depending on the environment andcontext in which the workflow is executed. The difficulty inunderstanding causes workflow developers to revert to startingfrom scratch rather than re-using existing fragments.
Through an analysis of the current practices in scientificworkflow development, we could gain insights on the creationof understandable and more effectively re-usable workflows.Specifically, we propose an analysis with the following objec-tives:
1) To reverse-engineer the set of current practices in work-flow development through an analysis of empirical evi-dence.
2) To identify workflow abstractions that would facilitateunderstandability and therefore effective re-use.
3) To detect potential information sources and heuristicsthat can be used to inform the development of tools forcreating workflow abstractions.
In this paper we present the result of an empirical analysisperformed over 177 workflow descriptions from Taverna [10]and Wings [3]. Based on this analysis, we propose a catalogueof scientific workflow motifs. Motifs are provided through i)a characterization of the kinds of data-oriented activities thatare carried out within workflows, which we refer to as data-oriented motifs, and ii) a characterization of the different man-ners in which those activity motifs are realized/implementedwithin workflows, which we refer to as workflow-orientedmotifs. It is worth mentioning that, although important, motifsthat have to do with scheduling and mapping of workflowsonto distributed resources [12] are out the scope of this paper.
The paper is structured as follows. We begin by providingrelated work in Section II, which is followed in Section III bybrief background information on Scientific Workflows, and thetwo systems that were subject to our analysis. Afterwards wedescribe the dataset and the general approach of our analysis.We present the detected scientific workflow motifs in SectionIV and we highlight the main features of their distribution
Data is the BottleneckCommon Motifs in Scientific Workflows:
An Empirical AnalysisDaniel Garijo⇤, Pinar Alper †, Khalid Belhajjame†, Oscar Corcho⇤, Yolanda Gil‡, Carole Goble†
⇤Ontology Engineering Group, Universidad Politecnica de Madrid. {dgarijo, ocorcho}@fi.upm.es†School of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk
‡Information Sciences Institute, Department of Computer Science, University of Southern California. [email protected]
Abstract—While workflow technology has gained momentumin the last decade as a means for specifying and enacting compu-tational experiments in modern science, reusing and repurposingexisting workflows to build new scientific experiments is still adaunting task. This is partly due to the difficulty that scientistsexperience when attempting to understand existing workflows,which contain several data preparation and adaptation steps inaddition to the scientifically significant analysis steps. One wayto tackle the understandability problem is through providingabstractions that give a high-level view of activities undertakenwithin workflows. As a first step towards abstractions, we reportin this paper on the results of a manual analysis performed overa set of real-world scientific workflows from Taverna and Wingssystems. Our analysis has resulted in a set of scientific workflow
motifs that outline i) the kinds of data intensive activities that areobserved in workflows (data oriented motifs), and ii) the differentmanners in which activities are implemented within workflows(workflow oriented motifs). These motifs can be useful to informworkflow designers on the good and bad practices for workflowdevelopment, to inform the design of automated tools for thegeneration of workflow abstractions, etc.
I. INTRODUCTION
Scientific workflows have been increasingly used in the lastdecade as an instrument for data intensive scientific analysis.In these settings, workflows serve a dual function: first asdetailed documentation of the method (i. e. the input sourcesand processing steps taken for the derivation of a certaindata item) and second as re-usable, executable artifacts fordata-intensive analysis. Workflows stitch together a varietyof data manipulation activities such as data movement, datatransformation or data visualization to serve the goals of thescientific study. The stitching is realized by the constructsmade available by the workflow system used and is largelyshaped by the environment in which the system operates andthe function undertaken by the workflow.
A variety of workflow systems are in use [10] [3] [7] [2]serving several scientific disciplines. A workflow is a softwareartifact, and as such once developed and tested, it can beshared and exchanged between scientists. Other scientists canthen reuse existing workflows in their experiments, e.g., assub-workflows [17]. Workflow reuse presents several advan-tages [4]. For example, it enables proper data citation andimproves quality through shared workflow development byleveraging the expertise of previous users. Users can alsore-purpose existing workflows to adapt them to their needs[4]. Emerging workflow repositories such as myExperiment
[14] and CrowdLabs [8] have made publishing and findingworkflows easier, but scientists still face the challenges of re-use, which amounts to fully understanding and exploiting theavailable workflows/fragments. One difficulty in understandingworkflows is their complex nature. A workflow may containseveral scientifically-significant analysis steps, combined withvarious other data preparation activities, and in differentimplementation styles depending on the environment andcontext in which the workflow is executed. The difficulty inunderstanding causes workflow developers to revert to startingfrom scratch rather than re-using existing fragments.
Through an analysis of the current practices in scientificworkflow development, we could gain insights on the creationof understandable and more effectively re-usable workflows.Specifically, we propose an analysis with the following objec-tives:
1) To reverse-engineer the set of current practices in work-flow development through an analysis of empirical evi-dence.
2) To identify workflow abstractions that would facilitateunderstandability and therefore effective re-use.
3) To detect potential information sources and heuristicsthat can be used to inform the development of tools forcreating workflow abstractions.
In this paper we present the result of an empirical analysisperformed over 177 workflow descriptions from Taverna [10]and Wings [3]. Based on this analysis, we propose a catalogueof scientific workflow motifs. Motifs are provided through i)a characterization of the kinds of data-oriented activities thatare carried out within workflows, which we refer to as data-oriented motifs, and ii) a characterization of the different man-ners in which those activity motifs are realized/implementedwithin workflows, which we refer to as workflow-orientedmotifs. It is worth mentioning that, although important, motifsthat have to do with scheduling and mapping of workflowsonto distributed resources [12] are out the scope of this paper.
The paper is structured as follows. We begin by providingrelated work in Section II, which is followed in Section III bybrief background information on Scientific Workflows, and thetwo systems that were subject to our analysis. Afterwards wedescribe the dataset and the general approach of our analysis.We present the detected scientific workflow motifs in SectionIV and we highlight the main features of their distribution
Fig. 3. Distribution of Data-Oriented Motifs per domain
Fig. 4. Distribution of Data Preparation motifs per domain
databases and shipping data to necessary locations for analysis.The impact of the environmental difference of Wings and
Taverna on the workflows is also observed in the workflow-oriented motifs (Figure 7). Stateful invocations motifs are notpresent in Wings workflows, as all steps are handled by adedicated workflow scheduling framework and the details arehidden from the workflow developers. In Taverna, the work-flow developer is responsible for catering for various differentinvocation requirements of 3rd party services, which mayinclude stateful invocations requiring execution of multipleconsecutive steps in order to undertake a single function.
Regarding workflow-oriented motifs, Figure 8 shows thatHuman-interaction steps are increasingly used in scientificworkflows, especially in the Biodiversity and Cheminformat-ics domains. Human interactions in Taverna workflows arehandled either through external tools (e.g., Google Refine),facilitated via a human-interaction plug-in, or through simplelocal scripts (e.g., selection of configuration values frommulti-choice lists). We have observed that non-trivial humaninteractions involving external tooling require a large numberof workflow steps dedicated to deploying or configuring theexternal tools, resulting in very large and complex workflows.Wings workflows do not support human interaction steps.
Finally, the large proportion of the combination of Compos-ite Workflows and Atomic Workflows motif in Figure 8 shows
Fig. 5. Data Preparation Motifs in the Genomics Workflows
Fig. 6. Data-Oriented Motifs in the Genomics Workflows
that the use of sub-workflows is an established best practicefor modularizing functionality.
VI. DISCUSSION
Our analysis shows that the nature of the environment inwhich a workflow system operates can bring-about obstaclesagainst the re-usability of workflows.
A. Obfuscation of Scientific WorkflowsData-intensive scientific analysis could be large and com-
plex with several processing steps corresponding to differentphases of data analysis performed over various kinds of data.This complexity is exacerbated when the workflow operates inan open environment, like Taverna’s, and composes multiplethird party services supporting different data formats andprotocols. In such cases the workflow contains additional stepsfor coping with different format and protocol requirements.This obfuscation of the workflow burdens the documentationfunction and creates difficulty for the workflow re-user sci-entists, who seeks to have a complete understanding of thefunction and the details of the workflow that they are re-usingin order to be able make scientific claims with their workflowbased studies.
Obfuscation is caused by the abundance of data preparationsteps, data movement operations and multi-step stateful invo-cations. One way to overcome obfuscation is to encapsulate
Data-Oriented Motifs per Domain
Data is the BottleneckCommon Motifs in Scientific Workflows:
An Empirical AnalysisDaniel Garijo⇤, Pinar Alper †, Khalid Belhajjame†, Oscar Corcho⇤, Yolanda Gil‡, Carole Goble†
⇤Ontology Engineering Group, Universidad Politecnica de Madrid. {dgarijo, ocorcho}@fi.upm.es†School of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk
‡Information Sciences Institute, Department of Computer Science, University of Southern California. [email protected]
Abstract—While workflow technology has gained momentumin the last decade as a means for specifying and enacting compu-tational experiments in modern science, reusing and repurposingexisting workflows to build new scientific experiments is still adaunting task. This is partly due to the difficulty that scientistsexperience when attempting to understand existing workflows,which contain several data preparation and adaptation steps inaddition to the scientifically significant analysis steps. One wayto tackle the understandability problem is through providingabstractions that give a high-level view of activities undertakenwithin workflows. As a first step towards abstractions, we reportin this paper on the results of a manual analysis performed overa set of real-world scientific workflows from Taverna and Wingssystems. Our analysis has resulted in a set of scientific workflow
motifs that outline i) the kinds of data intensive activities that areobserved in workflows (data oriented motifs), and ii) the differentmanners in which activities are implemented within workflows(workflow oriented motifs). These motifs can be useful to informworkflow designers on the good and bad practices for workflowdevelopment, to inform the design of automated tools for thegeneration of workflow abstractions, etc.
I. INTRODUCTION
Scientific workflows have been increasingly used in the lastdecade as an instrument for data intensive scientific analysis.In these settings, workflows serve a dual function: first asdetailed documentation of the method (i. e. the input sourcesand processing steps taken for the derivation of a certaindata item) and second as re-usable, executable artifacts fordata-intensive analysis. Workflows stitch together a varietyof data manipulation activities such as data movement, datatransformation or data visualization to serve the goals of thescientific study. The stitching is realized by the constructsmade available by the workflow system used and is largelyshaped by the environment in which the system operates andthe function undertaken by the workflow.
A variety of workflow systems are in use [10] [3] [7] [2]serving several scientific disciplines. A workflow is a softwareartifact, and as such once developed and tested, it can beshared and exchanged between scientists. Other scientists canthen reuse existing workflows in their experiments, e.g., assub-workflows [17]. Workflow reuse presents several advan-tages [4]. For example, it enables proper data citation andimproves quality through shared workflow development byleveraging the expertise of previous users. Users can alsore-purpose existing workflows to adapt them to their needs[4]. Emerging workflow repositories such as myExperiment
[14] and CrowdLabs [8] have made publishing and findingworkflows easier, but scientists still face the challenges of re-use, which amounts to fully understanding and exploiting theavailable workflows/fragments. One difficulty in understandingworkflows is their complex nature. A workflow may containseveral scientifically-significant analysis steps, combined withvarious other data preparation activities, and in differentimplementation styles depending on the environment andcontext in which the workflow is executed. The difficulty inunderstanding causes workflow developers to revert to startingfrom scratch rather than re-using existing fragments.
Through an analysis of the current practices in scientificworkflow development, we could gain insights on the creationof understandable and more effectively re-usable workflows.Specifically, we propose an analysis with the following objec-tives:
1) To reverse-engineer the set of current practices in work-flow development through an analysis of empirical evi-dence.
2) To identify workflow abstractions that would facilitateunderstandability and therefore effective re-use.
3) To detect potential information sources and heuristicsthat can be used to inform the development of tools forcreating workflow abstractions.
In this paper we present the result of an empirical analysisperformed over 177 workflow descriptions from Taverna [10]and Wings [3]. Based on this analysis, we propose a catalogueof scientific workflow motifs. Motifs are provided through i)a characterization of the kinds of data-oriented activities thatare carried out within workflows, which we refer to as data-oriented motifs, and ii) a characterization of the different man-ners in which those activity motifs are realized/implementedwithin workflows, which we refer to as workflow-orientedmotifs. It is worth mentioning that, although important, motifsthat have to do with scheduling and mapping of workflowsonto distributed resources [12] are out the scope of this paper.
The paper is structured as follows. We begin by providingrelated work in Section II, which is followed in Section III bybrief background information on Scientific Workflows, and thetwo systems that were subject to our analysis. Afterwards wedescribe the dataset and the general approach of our analysis.We present the detected scientific workflow motifs in SectionIV and we highlight the main features of their distribution
Fig. 3. Distribution of Data-Oriented Motifs per domain
Fig. 4. Distribution of Data Preparation motifs per domain
databases and shipping data to necessary locations for analysis.The impact of the environmental difference of Wings and
Taverna on the workflows is also observed in the workflow-oriented motifs (Figure 7). Stateful invocations motifs are notpresent in Wings workflows, as all steps are handled by adedicated workflow scheduling framework and the details arehidden from the workflow developers. In Taverna, the work-flow developer is responsible for catering for various differentinvocation requirements of 3rd party services, which mayinclude stateful invocations requiring execution of multipleconsecutive steps in order to undertake a single function.
Regarding workflow-oriented motifs, Figure 8 shows thatHuman-interaction steps are increasingly used in scientificworkflows, especially in the Biodiversity and Cheminformat-ics domains. Human interactions in Taverna workflows arehandled either through external tools (e.g., Google Refine),facilitated via a human-interaction plug-in, or through simplelocal scripts (e.g., selection of configuration values frommulti-choice lists). We have observed that non-trivial humaninteractions involving external tooling require a large numberof workflow steps dedicated to deploying or configuring theexternal tools, resulting in very large and complex workflows.Wings workflows do not support human interaction steps.
Finally, the large proportion of the combination of Compos-ite Workflows and Atomic Workflows motif in Figure 8 shows
Fig. 5. Data Preparation Motifs in the Genomics Workflows
Fig. 6. Data-Oriented Motifs in the Genomics Workflows
that the use of sub-workflows is an established best practicefor modularizing functionality.
VI. DISCUSSION
Our analysis shows that the nature of the environment inwhich a workflow system operates can bring-about obstaclesagainst the re-usability of workflows.
A. Obfuscation of Scientific WorkflowsData-intensive scientific analysis could be large and com-
plex with several processing steps corresponding to differentphases of data analysis performed over various kinds of data.This complexity is exacerbated when the workflow operates inan open environment, like Taverna’s, and composes multiplethird party services supporting different data formats andprotocols. In such cases the workflow contains additional stepsfor coping with different format and protocol requirements.This obfuscation of the workflow burdens the documentationfunction and creates difficulty for the workflow re-user sci-entists, who seeks to have a complete understanding of thefunction and the details of the workflow that they are re-usingin order to be able make scientific claims with their workflowbased studies.
Obfuscation is caused by the abundance of data preparationsteps, data movement operations and multi-step stateful invo-cations. One way to overcome obfuscation is to encapsulate
Data-Oriented Motifs per DomainFig. 3. Distribution of Data-Oriented Motifs per domain
Fig. 4. Distribution of Data Preparation motifs per domain
databases and shipping data to necessary locations for analysis.The impact of the environmental difference of Wings and
Taverna on the workflows is also observed in the workflow-oriented motifs (Figure 7). Stateful invocations motifs are notpresent in Wings workflows, as all steps are handled by adedicated workflow scheduling framework and the details arehidden from the workflow developers. In Taverna, the work-flow developer is responsible for catering for various differentinvocation requirements of 3rd party services, which mayinclude stateful invocations requiring execution of multipleconsecutive steps in order to undertake a single function.
Regarding workflow-oriented motifs, Figure 8 shows thatHuman-interaction steps are increasingly used in scientificworkflows, especially in the Biodiversity and Cheminformat-ics domains. Human interactions in Taverna workflows arehandled either through external tools (e.g., Google Refine),facilitated via a human-interaction plug-in, or through simplelocal scripts (e.g., selection of configuration values frommulti-choice lists). We have observed that non-trivial humaninteractions involving external tooling require a large numberof workflow steps dedicated to deploying or configuring theexternal tools, resulting in very large and complex workflows.Wings workflows do not support human interaction steps.
Finally, the large proportion of the combination of Compos-ite Workflows and Atomic Workflows motif in Figure 8 shows
Fig. 5. Data Preparation Motifs in the Genomics Workflows
Fig. 6. Data-Oriented Motifs in the Genomics Workflows
that the use of sub-workflows is an established best practicefor modularizing functionality.
VI. DISCUSSION
Our analysis shows that the nature of the environment inwhich a workflow system operates can bring-about obstaclesagainst the re-usability of workflows.
A. Obfuscation of Scientific WorkflowsData-intensive scientific analysis could be large and com-
plex with several processing steps corresponding to differentphases of data analysis performed over various kinds of data.This complexity is exacerbated when the workflow operates inan open environment, like Taverna’s, and composes multiplethird party services supporting different data formats andprotocols. In such cases the workflow contains additional stepsfor coping with different format and protocol requirements.This obfuscation of the workflow burdens the documentationfunction and creates difficulty for the workflow re-user sci-entists, who seeks to have a complete understanding of thefunction and the details of the workflow that they are re-usingin order to be able make scientific claims with their workflowbased studies.
Obfuscation is caused by the abundance of data preparationsteps, data movement operations and multi-step stateful invo-cations. One way to overcome obfuscation is to encapsulate
Data-Preparation Motifs per Domain
Make Data FlourishFrom data to information to knowledge
Make Data Flourish
Papers explicitly link to data
Track and publish explicit provenance information
Capture the processes by which data is manipulated
Global identification of data sets and data items
Metadata expressed usingshared vocabularies
From data to information to knowledge
Data uses a common syntax
Make Data Flourish
Papers explicitly link to data
Track and publish explicit provenance information
Capture the processes by which data is manipulated
Global identification of data sets and data items
Metadata expressed usingshared vocabularies
From data to information to knowledge
Data uses a common syntax"Someone who is not the person who collected the data can understand the experiment and data" - Shreejoy Tripathy
Linked Data• Use existing Web infrastructure
• Everything gets a URI and usually a category
• Express typed relations between things (triples)
• Express sameness or difference
• Reuse identifiers as much as possible
+ =
Salah, Alkim Almila Akdag, Cheng Gao, Krzysztof Suchecki, and Andrea Scharnhorst. 2012. “Need to Categorize: A Comparative Look at the Categories of Universal Decimal Classification System and Wikipedia.” Leonardo 45 (1) (February): 84-85. doi:10.1162/LEON_a_00344. (Preprint http://arxiv.org/abs/1105.5912v1)
Linked Data for ScienceNeuroscience Information Framework
(Ontologies, Semantic Wiki, Catalog)
BioPortal (ontologies)
Workflow Systems (WINGS, Taverna, …)
Rightfield (systems biology)
Organic Data Publishing (Semantic Wiki)
Linked Science (tools)
Nanopublications (small scientific assertions)
Bio2RDF (big linked data)
…Claire Monteleoni
As of September 2011
MusicBrainz
(zitgist)
P20
Turismo de
Zaragoza
yovisto
Yahoo! Geo
Planet
YAGO
World Fact-book
El ViajeroTourism
WordNet (W3C)
WordNet (VUA)
VIVO UF
VIVO Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UniRef
UniProt
UMBEL
UK Post-codes
legislationdata.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov.
uk
Traffic Scotland
theses.fr
Thesau-rus W
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
Open Library (Talis)
tags2con delicious
t4gminfo
Swedish Open
Cultural Heritage
Surge Radio
Sudoc
STW
RAMEAU SH
statisticsdata.gov.
uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
SSW Thesaur
us
SmartLink
Slideshare2RDF
semanticweb.org
SemanticTweet
Semantic XBRL
SWDog Food
Source Code Ecosystem Linked Data
US SEC (rdfabout)
Sears
Scotland Geo-
graphy
ScotlandPupils &Exams
Scholaro-meter
WordNet (RKB
Explorer)
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAASKISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP (RKB
Explorer)
Crime Reports
UK
Course-ware
CORDIS (RKB
Explorer)CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov.
ukRen. Energy Genera-
tors
referencedata.gov.
uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
Rådata nå!
PSH
Product Types
Ontology
ProductDB
PBAC
Poké-pédia
patentsdata.go
v.uk
OxPoints
Ord-nance Survey
Openly Local
Open Library
OpenCyc
Open Corpo-rates
OpenCalais
OpenEI
Open Election
Data Project
OpenData
Thesau-rus
Ontos News Portal
OGOLOD
JanusAMP
Ocean Drilling Codices
New York
Times
NVD
ntnusc
NTU Resource
Lists
Norwe-gian
MeSH
NDL subjects
ndlna
myExperi-ment
Italian Museums
medu-cator
MARC Codes List
Man-chester Reading
Lists
Lotico
Weather Stations
London Gazette
LOIUS
Linked Open Colors
lobidResources
lobidOrgani-sations
LEM
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
LinkedUser
FeedbackLOV
Linked Open
Numbers
LODE
Eurostat (OntologyCentral)
Linked EDGAR
(OntologyCentral)
Linked Crunch-
base
lingvoj
Lichfield Spen-ding
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Klapp-stuhl-club
Good-win
Family
National Radio-activity
JP
Jamendo (DBtune)
Italian public
schools
ISTAT Immi-gration
iServe
IdRef Sudoc
NSZL Catalog
Hellenic PD
Hellenic FBD
PiedmontAccomo-dations
GovTrack
GovWILD
GoogleArt
wrapper
gnoss
GESIS
GeoWordNet
GeoSpecies
GeoNames
GeoLinkedData
GEMET
GTAA
STITCH
SIDER
Project Guten-berg
MediCare
Euro-stat
(FUB)
EURES
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
CORDIS(FUB)
Freebase
flickr wrappr
Fishes of Texas
Finnish Munici-palities
ChEMBL
FanHubz
EventMedia
EUTC Produc-
tions
Eurostat
Europeana
EUNIS
EU Insti-
tutions
ESD stan-dards
EARTh
Enipedia
Popula-tion (En-AKTing)
NHS(En-
AKTing) Mortality(En-
AKTing)
Energy (En-
AKTing)
Crime(En-
AKTing)
CO2 Emission
(En-AKTing)
EEA
SISVU
education.data.g
ov.uk
ECS South-ampton
ECCO-TCP
GND
Didactalia
DDC Deutsche Bio-
graphie
datadcs
MusicBrainz
(DBTune)
Magna-tune
John Peel
(DBTune)
Classical (DB
Tune)
AudioScrobbler (DBTune)
Last.FM artists
(DBTune)
DBTropes
Portu-guese
DBpedia
dbpedia lite
Greek DBpedia
DBpedia
data-open-ac-uk
SMCJournals
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Metoffice Weather Forecasts
Discogs (Data
Incubator)
Climbing
data.gov.uk intervals
Data Gov.ie
databnf.fr
Cornetto
reegle
Chronic-ling
America
Chem2Bio2RDF
Calames
businessdata.gov.
uk
Bricklink
Brazilian Poli-
ticians
BNB
UniSTS
UniPathway
UniParc
Taxonomy
UniProt(Bio2RDF)
SGD
Reactome
PubMedPub
Chem
PRO-SITE
ProDom
Pfam
PDB
OMIMMGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Com-pound
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
Affy-metrix
bible ontology
BibBase
FTS
BBC Wildlife Finder
BBC Program
mes BBC Music
Alpine Ski
Austria
LOCAH
Amster-dam
Museum
AGROVOC
AEMET
US Census (rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
As of September 2011
MusicBrainz
(zitgist)
P20
Turismo de
Zaragoza
yovisto
Yahoo! Geo
Planet
YAGO
World Fact-book
El ViajeroTourism
WordNet (W3C)
WordNet (VUA)
VIVO UF
VIVO Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UniRef
UniProt
UMBEL
UK Post-codes
legislationdata.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov.
uk
Traffic Scotland
theses.fr
Thesau-rus W
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
Open Library (Talis)
tags2con delicious
t4gminfo
Swedish Open
Cultural Heritage
Surge Radio
Sudoc
STW
RAMEAU SH
statisticsdata.gov.
uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
SSW Thesaur
us
SmartLink
Slideshare2RDF
semanticweb.org
SemanticTweet
Semantic XBRL
SWDog Food
Source Code Ecosystem Linked Data
US SEC (rdfabout)
Sears
Scotland Geo-
graphy
ScotlandPupils &Exams
Scholaro-meter
WordNet (RKB
Explorer)
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAASKISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP (RKB
Explorer)
Crime Reports
UK
Course-ware
CORDIS (RKB
Explorer)CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov.
ukRen. Energy Genera-
tors
referencedata.gov.
uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
Rådata nå!
PSH
Product Types
Ontology
ProductDB
PBAC
Poké-pédia
patentsdata.go
v.uk
OxPoints
Ord-nance Survey
Openly Local
Open Library
OpenCyc
Open Corpo-rates
OpenCalais
OpenEI
Open Election
Data Project
OpenData
Thesau-rus
Ontos News Portal
OGOLOD
JanusAMP
Ocean Drilling Codices
New York
Times
NVD
ntnusc
NTU Resource
Lists
Norwe-gian
MeSH
NDL subjects
ndlna
myExperi-ment
Italian Museums
medu-cator
MARC Codes List
Man-chester Reading
Lists
Lotico
Weather Stations
London Gazette
LOIUS
Linked Open Colors
lobidResources
lobidOrgani-sations
LEM
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
LinkedUser
FeedbackLOV
Linked Open
Numbers
LODE
Eurostat (OntologyCentral)
Linked EDGAR
(OntologyCentral)
Linked Crunch-
base
lingvoj
Lichfield Spen-ding
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Klapp-stuhl-club
Good-win
Family
National Radio-activity
JP
Jamendo (DBtune)
Italian public
schools
ISTAT Immi-gration
iServe
IdRef Sudoc
NSZL Catalog
Hellenic PD
Hellenic FBD
PiedmontAccomo-dations
GovTrack
GovWILD
GoogleArt
wrapper
gnoss
GESIS
GeoWordNet
GeoSpecies
GeoNames
GeoLinkedData
GEMET
GTAA
STITCH
SIDER
Project Guten-berg
MediCare
Euro-stat
(FUB)
EURES
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
CORDIS(FUB)
Freebase
flickr wrappr
Fishes of Texas
Finnish Munici-palities
ChEMBL
FanHubz
EventMedia
EUTC Produc-
tions
Eurostat
Europeana
EUNIS
EU Insti-
tutions
ESD stan-dards
EARTh
Enipedia
Popula-tion (En-AKTing)
NHS(En-
AKTing) Mortality(En-
AKTing)
Energy (En-
AKTing)
Crime(En-
AKTing)
CO2 Emission
(En-AKTing)
EEA
SISVU
education.data.g
ov.uk
ECS South-ampton
ECCO-TCP
GND
Didactalia
DDC Deutsche Bio-
graphie
datadcs
MusicBrainz
(DBTune)
Magna-tune
John Peel
(DBTune)
Classical (DB
Tune)
AudioScrobbler (DBTune)
Last.FM artists
(DBTune)
DBTropes
Portu-guese
DBpedia
dbpedia lite
Greek DBpedia
DBpedia
data-open-ac-uk
SMCJournals
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Metoffice Weather Forecasts
Discogs (Data
Incubator)
Climbing
data.gov.uk intervals
Data Gov.ie
databnf.fr
Cornetto
reegle
Chronic-ling
America
Chem2Bio2RDF
Calames
businessdata.gov.
uk
Bricklink
Brazilian Poli-
ticians
BNB
UniSTS
UniPathway
UniParc
Taxonomy
UniProt(Bio2RDF)
SGD
Reactome
PubMedPub
Chem
PRO-SITE
ProDom
Pfam
PDB
OMIMMGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Com-pound
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
Affy-metrix
bible ontology
BibBase
FTS
BBC Wildlife Finder
BBC Program
mes BBC Music
Alpine Ski
Austria
LOCAH
Amster-dam
Museum
AGROVOC
AEMET
US Census (rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
0
100
200
300
400
1 m
ei 2
007
8 ok
t. 20
07
7 no
v. 20
07
10 n
ov. 2
007
28 fe
b. 2
008
31 m
rt. 2
008
18 s
ep. 2
008
5 m
rt. 2
009
27 m
rt. 2
009
14 ju
l. 20
09
22 s
ep. 2
010
19 s
ep. 2
011
23 fe
b. 2
012
62.224.812.703 Triples!
62.224.812.703 Triples!(1.75 Billion)
LODStats Analysis
0
35
70
105
140
Not
RDF
Con
nect
ion
rese
t
Unk
now
n re
spon
se
XML
No
URL
pro
vide
d
Oth
er
HTT
P
134
84
3022
12116
45%
28%
10%
7%4%4%2%
Not RDFConnection resetUnknown responseXMLNo URL providedOtherHTTP
Hoekstra, Rinke; Groth, Paul (2013): Distribution of Errors Reported by LOD2 LODStats Project. figshare. http://dx.doi.org/10.6084/m9.figshare.695949
http://stats.lod2.eu
299 out of 639 datasets have errors
An Ambient Agent Model for Monitoring and Analysing Dynamics of Complex Human Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dy-namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni-toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
*Corresponding author. E-mail: [email protected].
Journal of Ambient Intelligence and Smart Environments
An Ambient Agent Model for Monitoring and Analysing Dynamics of Complex Human Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dy-namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni-toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
*Corresponding author. E-mail: [email protected].
Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”
An Ambient Agent Model for Monitoring and Analysing Dynamics of Complex Human Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dy-namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni-toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
*Corresponding author. E-mail: [email protected].
Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”
“Um, but doesn’t TTL have incompatible semantics?”
An Ambient Agent Model for Monitoring and Analysing Dynamics of Complex Human Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dy-namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni-toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
*Corresponding author. E-mail: [email protected].
Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”
“Um, but doesn’t TTL have incompatible semantics?”
“Nah, silly, who cares? We’ll just start a new W3C WG!”
An Ambient Agent Model for Monitoring and Analysing Dynamics of Complex Human Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dy-namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni-toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
*Corresponding author. E-mail: [email protected].
Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”
“Um, but doesn’t TTL have incompatible semantics?”
“Nah, silly, who cares? We’ll just start a new W3C WG!”
“Uh, ok, if we must. But even then, we can’t just publish the model as is!”
An Ambient Agent Model for Monitoring and Analysing Dynamics of Complex Human Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dy-namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni-toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
*Corresponding author. E-mail: [email protected].
Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”
“Um, but doesn’t TTL have incompatible semantics?”
“Nah, silly, who cares? We’ll just start a new W3C WG!”
“Uh, ok, if we must. But even then, we can’t just publish the model as is!”
“”No worries, just add the provenance using PROV-O, annotate the PDF with OA, and link to other research using CITO.”
An Ambient Agent Model for Monitoring and Analysing Dynamics of Complex Human Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dy-namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni-toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
*Corresponding author. E-mail: [email protected].
Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”
“Um, but doesn’t TTL have incompatible semantics?”
“Nah, silly, who cares? We’ll just start a new W3C WG!”
“Uh, ok, if we must. But even then, we can’t just publish the model as is!”
“”No worries, just add the provenance using PROV-O, annotate the PDF with OA, and link to other research using CITO.”
“And that’s it?”
An Ambient Agent Model for Monitoring and Analysing Dynamics of Complex Human Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dy-namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni-toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
*Corresponding author. E-mail: [email protected].
Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”
“Um, but doesn’t TTL have incompatible semantics?”
“Nah, silly, who cares? We’ll just start a new W3C WG!”
“Uh, ok, if we must. But even then, we can’t just publish the model as is!”
“”No worries, just add the provenance using PROV-O, annotate the PDF with OA, and link to other research using CITO.”
“And that’s it?”“Noo! You’ll need persistent Cool URI’s and publish your endpoint
for eternity of course. Duh.”
An Ambient Agent Model for Monitoring and Analysing Dynamics of Complex Human Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dy-namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni-toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
*Corresponding author. E-mail: [email protected].
Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”
“Um, but doesn’t TTL have incompatible semantics?”
“Nah, silly, who cares? We’ll just start a new W3C WG!”
“Uh, ok, if we must. But even then, we can’t just publish the model as is!”
“”No worries, just add the provenance using PROV-O, annotate the PDF with OA, and link to other research using CITO.”
“And that’s it?”“Noo! You’ll need persistent Cool URI’s and publish your endpoint
for eternity of course. Duh.”“Eh?”
“Oh... and don’t forget all data collected by the agents, in all runs, including the first experiments. Now THAT would be ultra cool.
“Ngh!?”
Creating Linked Data• Decide on resources to describe
• Mint cool URIs
• Decide on triples to include
• Describe the dataset
• Choose vocabularies
• Define terms
• Make links
• Publish to triple store/annotations/dump
http://linkeddatabook.com
If this already is tedious...
... can you expect researchers to publish Linked Research Data?
If this already is tedious...
... can you expect researchers to publish Linked Research Data?
Conclusion?
Linked Data is sóóóóó 2005
We need to make publishing Linked Research Data...
... more persistent ... ... and more rewarding....a lot easier...
We need to make publishing Linked Research Data...
... more persistent ... ... and more rewarding....a lot easier...
“People as frontier in computing” - Haym Hirsch, Pietro Michelucci
http://linkitup.data2semantics.org
We need to make publishing Linked Research Data...
... more persistent ... ... and more rewarding....a lot easier...
• Lightweight web application
• Interface to API of existing data repositories
• Enrich metadata by linking to (linked) data resources
• Human in the Loop
• Track provenance
• Publish rich metadata as new data publication
http://linkitup.data2semantics.org
We need to make publishing Linked Research Data...
... more persistent ... ... and more rewarding....a lot easier...
Nanopublication + OA + PROV-O + DCTerms + FOAF
• Lightweight web application
• Interface to API of existing data repositories
• Enrich metadata by linking to (linked) data resources
• Human in the Loop
• Track provenance
• Publish rich metadata as new data publication
http://linkitup.data2semantics.org
We need to make publishing Linked Research Data...
... more persistent ... ... and more rewarding....a lot easier...
Nanopublication + OA + PROV-O + DCTerms + FOAF
Use tags & categories to query the DBpedia endpoint
Use authors to query the DBLP endpoint
Use tags & categories to query the NeuroLex endpoint
Use author names to query the ORCID API
Extract references to resolve to CrossRef DOIs
Every operation is tracked automatically
Review selected links, and publish to Figshare
PluginsName Service Source Links toDBLP SPARQL Authors Author IdentifiersORCID REST Authors Author Identifiers
LinkedLifeData REST Tags & Categories Biomedical EntitiesCrossref Custom Citations DOIs
Elsevier LDR REST Tags & Categories Funding agenciesDANS EASY Custom Tags & Categories General Datasets
SameAs REST Links General EntitiesDBPedia Spotlight REST Description, Tags &
CategoriesGeneral Entities
DBPedia/Wikipedia SPARQL Tags & Categories General EntitiesNeuroLex SPARQL Tags & Categories Neuroscience Concepts
NIF Registry REST Tags & Categories Neuroscience Datasetsyour data set here
What does this solve?• Decide on resources to describe
• Mint cool URIs
• Decide on triples to include
• Describe the dataset
• Choose vocabularies
• Define terms
• Make links
• Publish to triple store/annotations/dump
http://linkeddatabook.com
What does this solve?• You decide on resources to describe
• We mint cool URIs
• We decide on triples to include
• We describe the dataset
• We choose vocabularies
• We define terms
• Together we make links
• We publish the dataset to a reliable repository
http://linkeddatabook.com
Coming up…• Publish directly from Dropbox, Github, …
• Reconstruct provenance information (http://git2prov.org)
• Analyze, convert and enrich on the fly
• Generate a data report for advertisement purposes
• Measure for information content of datasets (“D-Index”)
• Integrate a data dashboard
linkitup … enhancing the data publication…
… increasing findability …
… boosting reusability …
… result is stored persistently
0
35
70
105
140
Not
RDF
Con
nect
ion
rese
t
Unk
now
n re
spon
se
XML
No
URL
pro
vide
d
Oth
er
HTT
P
134
84
302212116http://linkitup.data2semantics.org
http://www.data2semantics.org
http://yasgui.data2semantics.org http://semweb.cs.vu.nl/provoviz
http://git2prov.org