sophie aubin - inra openminted · 2020-01-30 · data - text interoperability experimental data...

16
1 twitter.com/openminted_eu Sophie Aubin - INRA Text mining services for e-infrastructures CAPSELLA Open Data Workshop, Chania, 2 June 2017 OpenMinTeD

Upload: others

Post on 24-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

1

twitter.com/openminted_eu

Sophie Aubin - INRA

Text mining services for e-infrastructures

CAPSELLA Open Data Workshop, Chania, 2 June 2017

OpenMinTeD

Page 2: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

2

Amounts of scientific texts...

This is where the footer goesCAPSELLA Open Data Workshop, Chania, 2 June 2017

120,000 papers published on a single taxon “Zea mays”

90% of papersnever cited*

1 paper/sec

50% of papersnever read by anyone than its authors, referees, journal editors*

Publications

+ reports, patents, books, surveys, news, etc.

*Lokman I. M

eho, The rise and rise of citation analysis, 2007

Page 3: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

3

TDM services

This is where the footer goesCAPSELLA Open Data Workshop, Chania, 2 June 2017

Indexing documents and datasets

Entities identification and normalisation against reference data

Information extraction: from semi-structured to structured

Semantic/lexical resource acquisition from texts

Page 4: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

However, …Multitude of solutions catering for different

Text Types NewswireScientific LiteratureTweets/blogsPatentsClinical/medical recordsTextbooks, monographsOnline forums….

LanguagesEnglish French GermanSpanishPortugueseItalianPolish….

TasksTranslationKnowledge acquisitionSemantic SearchQuestion AnsweringSentiment AnalysisSummarizationKnowledge Discovery….

Domains AgricultureHealthBiologySocial SciencesEnvironment….

Creating a fragmented landscape

CAPSELLA Open Data Workshop, Chania, 2 June 2017

Page 5: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

5

Then comes

This is where the footer goes

- research groups in text-mining

- content providers- a data center- a library association- legal experts- community related

partners- SMEs

CAPSELLA Open Data Workshop, Chania, 2 June 2017

Duration: 3 years (2015-2018), 16 Partners

Page 6: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

6

Partners of

This is where the footer goesCAPSELLA Open Data Workshop, Chania, 2 June 2017

Page 7: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

7

What is ?

is a platform that works as an infrastructural service of the wider research

ecosystem

CAPSELLA Open Data Workshop, Chania, 2 June 2017

Page 8: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

8

Our Services

CAPSELLA Open Data Workshop, Chania, 2 June 2017

Discover TDM Services and tools

Feed with home content or easily get texts from (OA) content hubs

Pick adequate knowledge resources

Share and Re-Use

Build your own service/applications – Combine components into a Workflow

Page 9: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

9

This is where the footer goes

How does all this bind together?

CAPSELLA Open Data Workshop, Chania, 2 June 2017

Page 10: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

10

This is where the footer goes

OpenMinTeD Scientific/BusinessApplications

Data & content repositories/registries For Data analysts, ScientistsCuration toolsFor Data providersAnalytical tools For Scientists-policy makersDecision toolsFor Farmers, SMEsKnowledge acquisition toolsFor Scientists, Ontologists

CAPSELLA Open Data Workshop, Chania, 2 June 2017

Page 11: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

11

Example: WheatIS / gnpIS

This is where the footer goes

Application: federated search of genomic and genetic data for wheat

Added value: direct access from data to related scientific articles

Objects: taxa, genes, markers, phenotypes and varieties

Challenges: naming heterogeneity & scale variety between textual and experimental data

CAPSELLA Open Data Workshop, Chania, 2 June 2017

Page 12: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

12

data - text interoperabilityexperimental data Knowledge

textual data

CAPSELLA Open Data Workshop, Chania, 2 June 2017

semantic interop.

pheno

web services

bib

>18,000 articles

300,000 genes30,000 markers220 taxa208 phenotypes

Page 13: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

13

Example: living conditions of food-related micro-organismsApplication: study and characterize the microbial biodiversity of food ecosystems (dairy or meat products, fish, wine, etc.)

- risk management- food quality improvement

Added value: completion of knowledge databases with info from the literature

Objects: bacteria, habitats and phenotypes

Challenges: heterogeneous data integration, object identity

CAPSELLA Open Data Workshop, Chania, 2 June 2017

Page 14: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

14

CAPSELLA Open Data Workshop, Chania, 2 June 2017

OpenMinTeD for agricultureD

ata/

Con

tent

Sem

antic

s

My decision app

End-

user

GACSlandvocAgrovocPO...

Page 15: Sophie Aubin - INRA OpenMinTeD · 2020-01-30 · data - text interoperability experimental data Knowledge textual data CAPSELLA Open Data Workshop, Chania, 2 June 2017 semantic interop

15

Text mining

This is where the footer goes

- creates value from texts- creates even more value if the results are linked to

other data

This relies on shared semantics, standards & protocols

This requires less tech competencies & resources thanks to the common e-infra

CAPSELLA Open Data Workshop, Chania, 2 June 2017