semiotics in spreadsheets

Post on 14-Jul-2015

170 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Semiotics in Spreadsheets: Enhancing Semantic

InteroperabilityIvelize Rocha Bernardo

André Santanchè

Outline

•Motivation•Research Problems•Related Work•What I did in my Master Degree•Limitations of the Master Degree Proposal•Which are the plans to the PhD

Motivation

Large amount of information in spreadsheets[Syed et al., 2010]

Motivation

Large amount of information in spreadsheets[Syed et al., 2010]

Why?

•They are intuitive•They have high flexibility -> diverse needs

Motivation

However, they were designed for:•Isolated use•Human reading

Research Goal

The main goal of our research is to promote a richer semantic interoperability among spreadsheets

Interoperability(Ouksel & Sheth 1999)

system interoperabilitysyntactic interoperabilitystructural interoperabilitysemantic interoperability

Interoperability(Ouksel & Sheth 1999)

system interoperabilitysyntactic interoperabilitystructural interoperabilitysemantic interoperability

(Tolk 2006)no interoperabilitytechnical interoperabilitysyntactic interoperabilitysemantic interoperabilitypragmatic interoperabilitydynamic interoperabilityconceptual interoperability

Interoperability(Ouksel & Sheth 1999)

system interoperabilitysyntactic interoperabilitystructural interoperabilitysemantic interoperability

(Tolk 2006)no interoperabilitytechnical interoperabilitysyntactic interoperabilitysemantic interoperabilitypragmatic interoperabilitydynamic interoperabilityconceptual interoperability

Interoperability

semantic interoperability semantic interoperabilitypragmatic interoperabilitydynamic interoperabilityconceptual interoperability

Data Interpretation

Which elements must be considered in this

interpretation process?

Which elements must be considered in this

interpretation process?

Unity Interpretation

Related Work

isolated label

(Han et al,. 2008) - RDF123: from spreadsheets to RDF, The Semantic Web. Lecture Notes in Computer Science, vol. 5318. Springer

(Langegger & Wolfram, 2009) - XLWrap Querying and Integrating Arbitrary Spreadsheets with SPARQL, The Semantic Web. Lecture Notes in Computer Science, vol. 5823. Springer

Related Work

template

(Abraham & Erwig, 2006) - Inferring Templates from Spreadsheets, Proceedings of the International Conference on Software Engineering

Related Work

instances

(Zhao et al, 2010) - A spreadsheet system based on data semantic object, IEEE International Conference on Information Management and Engineering

Related Work

isolated label associated to linked data

(Syed et al., 2010) - Exploiting a Web of Semantic Data for Interpreting Tables, Proceedings of the Web Science Conference

Related Work

correlation of labels associated to linked data

(Venetis et al., 2011) - Recovering Semantics of Tables on the Web, Proceedings of the VLDB Endowment

(Mulwad et al., 2010) - Using linked data to interpret tables, Proceedings of the International Workshop on Consuming Linked Data

Related Work

correlation between several spreadsheet elements associated to linked data

(Limaye, 2010) - Annotating and Searching Web Tables Using Entities, Proceedings of the VLDB Endowment

How far the system can interpret, considering labels and

their correlations?

How much different they are in fact?

How much different they are in fact?

How much different they are in fact?

How much different they are in fact?

What I did in my Master Degree

Research Strategy1. To identify construction patterns followed by biologists

during the creation of these spreadsheets

2. To verify if these construction patterns could lead us to recognition of the spreadsheet purpose

3. To achieve a semantic interoperability among these spreadsheets

How to identify Construction Patterns

*

*

How to identify Construction Patternswhat

*

How to identify Construction Patternswhat

*

How to identify Construction Patternswhat

what

*

How to identify Construction Patternswhat

whatwhen

*

How to identify Construction Patternswhat

what wherewhen

Construction Patterns

*

Construction Patterns

*

catalogue

Construction Patterns

*

catalogue

Construction Patterns

*

catalogue

collection

Construction Patterns

*

catalogue

collection

SciSpread System

Architecture EvaluationAutomatic analysis of 11,150 spreadsheets

the system recognized 1,151 spreadsheets806 spreadsheets were classified as catalogue

345 spreadsheets were classified as collection

Total: 748,459 records analyzed

*

Architecture Evaluation - Results

• Random subset of 1,203 spreadsheets was selected to evaluate precision/recall– Precision: 0.84

– Recall: 0.76

– Specificity: 0.95

*

Limitation of the Master Degree Proposal

Main Limitations● Single DomainSpecific spreadsheets (catalogue and

collection)

● Lack of a Model to represent construction patterns○ after, model for construction

patterns isolated for each other

● Linking labels to ontologies○ not able to aggregate different

labels belonging to the same concept

○ the ontology was selected by us, it is not necessarily the best representation for spreadsheets' data

● Single Domain○ Specific spreadsheets (catalogue

and collection)

● Lack of a Model to represent construction patterns○ after, model for construction

patterns isolated for each other

● Linking labels to ontologies○ not able to aggregate different

labels belonging to the same concept

○ the ontology was selected by us, it is not necessarily the best representation for spreadsheets' data

● Multiple Domains

● Model as an association network○ relates elements and

concepts of several spreadsheets

● Linking spreadsheet structure to ontologies○ the link is made between

concepts

Which are the plans to my PhD

Start

SEEK

Start

SEEK

proj.

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unittre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

tre.val.

SD

Unittre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

tre.val.

SD

Unittre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

ura

Saccharomyces_cerevisiae

4932

CEN.PK-113-7D

ura3

6,5 0,1 37 0,5 oC

tre.val.

SD

Unittre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

ura

Saccharomyces_cerevisiae

4932

CEN.PK-113-7D

ura3

6,5 0,1 37 0,5 oC

tre.val.

SD

Unittre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

ura

Saccharomyces_cerevisiae

4932

CEN.PK-113-7D

ura3

6,5 0,1 37 0,5 oC

tre.val.

SD

Unittre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

ura

Saccharomyces_cerevisiae

4932

CEN.PK-113-7D

ura3

6,5 0,1 37 0,5 oC

tre.val.

SD

Unittre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

ura

Saccharomyces_cerevisiae

4932

CEN.PK-113-7D

ura3

6,5 0,1 37 0,5 oC

tre.val.

SD

Unittre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

ura

Saccharomyces_cerevisiae

4932

CEN.PK-113-7D

ura3

6,5 0,1 37 0,5 oC

tre.val.

SD

Unittre.val.

SD

Unit

Semantic Interoperability among Spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

ID

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

trea.

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

trea.

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

trea.

tre.val.

SD

Unit

tre.val.

SD

Unittre.val.

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

trea.

SD

Unit

tre.val.

SD

Unittre.val.

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

trea.

tre.val.

SD

Unit

tre.val.

SD

Unit

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unit

tre.val.

SD

Unit

IDtimerel.glu.

genotype

trea.

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unit

tre.val.

SD

Unit

IDtimerel.glu.

genotype

trea.

SpreadsheetPurpose

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unit

tre.val.

SD

Unit

IDtimerel.glu.

genotype

trea.

SpreadsheetPurpose

SpreadsheetDomain

Data Model

Spreadsheets Semiotic Sign

Data Model

Spreadsheets Semiotic Sign

signifierstructuralform

Data Model

Spreadsheets Semiotic Sign

signifier signifiedstructuralform

spreadsheet purpose

+semantic

spreadsheet data

Architecture

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unit

tre.val.

SD

Unit

IDtimerel.glu.

genotype

trea.

SpreadsheetPurpose

SpreadsheetDomain

StartXYZ

How to devise different domains when the networks are interconnected?

Research Challenge

SpreadsheetDomain

SpreadsheetPurpose

Research Questions

• When spreadsheets could be considered of the same purpose?

• Is there a canonical representation among spreadsheets of the same purpose?

• Is it possible to define a canonical representation for a spreadsheet group• Can this representation be used to predict

spreadsheets of a given purpose?

Acknowledgements● Laboratory of Information Systems (LIS)● UNICAMP● FAPESP● Microsoft Research FAPESP Virtual Institute

(NavScales project)● CNPq (MuZOO Project and PRONEX-FAPESP)● INCT in Web Science(CNPq 557.128/2009-9)● CAPES

Thank you for your attention!

top related