multilingual ontology for plant health threats media monitoring

23
GRIHO Research Group, INSPIRES Research Centre, Universitat de Lleida Roberto García, Josep Maria Brunetti*, Rosa Gil, Jordi Virgili, Toni Granollers Multilingual Ontology for Plant Health Threats Media Monitoring (A Smart Data Approach)

Upload: rogargon

Post on 12-Apr-2017

22 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Multilingual Ontology for Plant Health Threats Media Monitoring

GRIHO Research Group, INSPIRES Research Centre, Universitat de Lleida

Roberto García, Josep Maria Brunetti*, Rosa Gil, Jordi Virgili, Toni Granollers

Multilingual Ontology for Plant Health Threats

Media Monitoring(A Smart Data Approach)

Page 2: Multilingual Ontology for Plant Health Threats Media Monitoring

Media Monitoring for New and (Re)Emerging Plant Health Threats• Project: development and testing of the media monitoring tool

MedISys for the early identification and reporting of existing and emerging plant health threats

• Timing (duration): January 2014 – June 2016 (2.5 years)• Funding: EFSA• Coordination: Universitat de Lleida (UdL)• Partners: IRTA and UdL• Other participants: Joint Research Centre (European Commission)

• Objectives: • Collate new and appropriate media information sources• Multilingual ontology for the global identification of emerging new plant health threats to be appended to MedISys

• English, Spanish, Italian, French, Dutch, German, Portuguese, Russian, Chinese and Arabic

• Develop and test strategies to monitor re-emerging plant health threats on global and regional scale• Analyse and test approaches to report identified signals to EFSA Units and experts through MedISys

Page 3: Multilingual Ontology for Plant Health Threats Media Monitoring

Approach

• Ontology: key component of the developed system that structures and provides knowledge about plant health threats• Knowledge captured from existing sources and experts• Guides applications for

• Knowledge capture• Indirect sources search• Terms translation• Media monitoring categories generation

3

Page 4: Multilingual Ontology for Plant Health Threats Media Monitoring

Ontology Skeleton• Collected 140 pests/diseases from EPPO Alerts, 2000/29-1-A-1 and EU

Emergency Control Measures• 117 linked to UniProt Taxonomy:• Taxonomical information, scientific/common/other names,…

• 47 linked also to Wikipedia• Common names in multiple languages

4

Page 5: Multilingual Ontology for Plant Health Threats Media Monitoring

Plant Health Threats Ontology• Enrich ontology with affected crops, hosts, vectors, symptoms

expressions…

5

Page 6: Multilingual Ontology for Plant Health Threats Media Monitoring

Plant Health Threats Ontology• All concepts linked to labels in different languages• Extract as keywords for MedISys or Web search filters,…• Example: “Maladie de Pierce” OR ( “grapevine” AND “sharpshooter” )

6

Xylella fastidiosa

Gammaproteobacteria

Nerium oleander, Prunus salicina, Medicago sp., Sorghum halepense,…

Homalodisca coagulata, Graphocephala sp., Oncometopia sp.,

Draeculacephala sp.,…

Grapevine, Citrus, Olive, Almond, Peach, Coffee,…

subClassOf

vector

hostcrop

“Pierce's disease”, “Citrus variegated chlorosis” en

“Maladie de Pierce” fr

“ 葉緣焦枯病菌” zn

“Glassy-winged sharpshooter”, “Spittlebugs”, “Froghoppers”,“Planthoppers”,… en

“vite” it,… …

Page 7: Multilingual Ontology for Plant Health Threats Media Monitoring

Ontology Editor• Assist experts during the knowledge capture process

7

http://indagus.udl.cat/medisys/editor/

Page 8: Multilingual Ontology for Plant Health Threats Media Monitoring

Ontology Editor – forms with assistance

8

Page 9: Multilingual Ontology for Plant Health Threats Media Monitoring

Ontology Editor - autocomplete

9

Page 10: Multilingual Ontology for Plant Health Threats Media Monitoring

Ontology Editor - symptoms form

10

Page 11: Multilingual Ontology for Plant Health Threats Media Monitoring

Semi-automatic Translation•

11

Page 12: Multilingual Ontology for Plant Health Threats Media Monitoring

Multilingual Ontology• Threats names• 1609 terms• 27 languages

Not available61738%

Latin37523%

English26216%

French815%

German684%

Spanish654%

Japanese211%

Dutch171%

Italian161%

Portugues151%

Finish8

0%

Chinese7

0%

Russian6

0%

Other513%

Page 13: Multilingual Ontology for Plant Health Threats Media Monitoring

Ontology - symptom expression• Symptom Expression = symptom + plant part • Set of symptoms and plant parts from CABI form and Plant Ontology

• 37 symptoms: – abnormal fall, premature fall– abnormal patterns, chlorotic rings– abnormal shape, malformation, distortion– boring, drilling, internal feeding, mining, tunnelling– canker– chlorosis– colour inversion, colour inversion– curling, curl– dieback– discoloration, discolouration– dwarfing– early senescence, premature senescence– empty– feeding– frass– gummosis– lesion, lesions– mottled, mottle– mummification, wrinkled, hard skin

– dead, death, necrosis– odour– premature drop– premature ripening– reddening– reduced size, smaller– resinosis– roll, rolling– rosetting– rot, rotting– burn, scorch– splitting– stunting– thicker– fallen, toppled, falling– rooted out, uprooted– wilt, wilting– yellowing

356 terms for symptoms

Page 14: Multilingual Ontology for Plant Health Threats Media Monitoring

Ontology - symptom expression• Symptom Expression = Symptom + Plant Part• 6 Plant Parts:– fruit– plant, tree, whole plant– bud, sprout– stem– seed, seeds– leaf, leaves

•Examples:– Whole Plant Dwarfing– Leaf Scorch– Stems Stunting– Leaf Reddening– Fruit Premature Drop– Seeds Discoloration – Leaf Mottle

96 plant part terms

Page 15: Multilingual Ontology for Plant Health Threats Media Monitoring

Ontology Browser• Complex queries• Example: “all threats with symptoms affecting the leaves”

http://indagus.udl.cat/plantHealthThreats/

Page 16: Multilingual Ontology for Plant Health Threats Media Monitoring

Identification of Information Source to Monitor• Objective: collect relevant information sources to be monitored by

MedISys• Methodology• Identify information sources already known by experts, previous research

projects, official sources like EPPO, journals,… Direct Sources

• Identify web information sources (newspapers, blogs, webs, etc.) unknown discovered using search engines and ontology terms

Indirect Sources

• Analyse and evaluate all collected sources using Information Quality measure• First , filter duplicates, irrelevant, non-monitorable, etc.

Page 17: Multilingual Ontology for Plant Health Threats Media Monitoring

Methodology Plant Health Threats Sources Inventory

Known Sources Web Search

Reference resources

(expert knowledge)

Existing projects related to pest and food/feed

risks (EFSA)

MedISys sources (JRC)

Filtering and Evaluation

process

List of relevant sources

List of relevant sources

Filtering process

(avoid duplicates & evaluation)

Final list

Search Mechanisms

(query Process)

1956 sources(72 known + 1884 web search)

Ontology

Page 18: Multilingual Ontology for Plant Health Threats Media Monitoring

Monitor Known Threats• Known threats: explicit mention of the threat name

• Generate automatically from ontology

• MedISys category for each threat withlist of keywords (terms) with threshold

• 117 categories for known threats:• Bacteria: Xylella fastidiosa, Acidovorax citrulli,… (6)• Fungi: Ceratocystis fagacearum, Diplocarpon mali,… (18)• Insects: Agrilus coxalis auroguttatus, Agrilus planipennis,… (54)• Mollusks: Pomacea (1)• Nematodes: Bursaphelenchus xylophilus, Nacobbus aberrans,… (7)• Oomycetes: Phytophthora ramorum (1)• Phytoplalsma: Elm yellows phytoplasma, Candidatus Phytoplasma pruni,… (7)• Viroid: Tomato apical stunt viroid, Potato spindle tuber viroid (2)• Virus: Andean potato latent virus, Andean potato mottle virus,… (21)

http://medisys.newsbrief.eu/medisys/groupedition/en/PlantHealthAll.html

18

Keyword sources Threshold

Scientific names 100

Common names (all languages) 100

Other names 100

Page 19: Multilingual Ontology for Plant Health Threats Media Monitoring

Monitor Unknown Threats• Unknown Threats: name not explicitly mentioned• Approach 1: manual generation of MedISys categories by experts

http://medisys.newsbrief.eu/medisys/filteredition/en/EFSAUnknownPestFilteredEmailAlert.html

19

A combination of Combinations (Proximity: 15)

at least one of alien, danger, dangerous, deadly…

and at least one of agricultural, agriculture, almond…

and at least one of bacteria, bacterial, crop+failure,…

but none of allergies, allergy, animal+abuse,…

Page 20: Multilingual Ontology for Plant Health Threats Media Monitoring

Monitor Unknown Threats• Approach 2: automatic generation from ontology (multilingual)

• Concepts associated to the threats (but not their names)• Affected crops, vectors, hosts, symptoms, plant parts,...

• Currently, the ontology models the symptoms for just 7 threats:• Phytophthora ramorum, Anoplophora glabripennis, Bactrocera tryoni, Agrilus planipennis, Xylella fastidiosa, Candidatus liberibacter and

Rhynchophorus ferrugineus• http://medisys.newsbrief.eu/medisys/alertedition/en/AgrilusPlanipennis-PHT-Symptoms.html • http://medisys.newsbrief.eu/medisys/alertedition/en/AnoplophoraGlabripennis-PHT-Symptoms.html • http://medisys.newsbrief.eu/medisys/alertedition/en/BactroceraTryoni-PHT-Symptoms.html • http://medisys.newsbrief.eu/medisys/alertedition/en/CandidatusLiberibacter-PHT-Symptoms.html • http://medisys.newsbrief.eu/medisys/alertedition/en/PhytophthoraRamorum-PHT-Symptoms.html • http://medisys.newsbrief.eu/medisys/alertedition/en/RhynchophorusFerrugineus-PHT-Symptoms.html • http://medisys.newsbrief.eu/medisys/alertedition/en/XylellaFastidiosa-PHT-Symptoms.html

20

Combinations tree (Proximity 10) ExampleAffected crop AND Symptom AND Plant Part “walnut” AND “necrosis” AND “tree”ORAffected crop AND Vectors “lime” AND “asian citrus psyllid”

Page 21: Multilingual Ontology for Plant Health Threats Media Monitoring

Results• Known threats

• MedISys categories using threat names as keywords very effective• Example Xylella fastidiosa:

• 5078 relevant news items selected from February 2015 to May 2016 (16 months) • However, they miss items not explicitly mentioning the threat

• Unknown threats• Manually defined categories by experts

• 80% items relevant• 10 items per day

• Categories generated automatically using symptoms, crops, vectors…• 60% items relevant • Just 7 per week

• A lot of noise, terms ambiguity• Added negative words to filter false positives but increased false negatives

• Anyway, just preliminary work (just 7 threats modelled)…

21

Page 22: Multilingual Ontology for Plant Health Threats Media Monitoring

Future workBuild Disease-Symptom network like for human health?

22

Zho u, X., Menche, J., Barabási, A. L., & Sharma, A. (2014) Human symptoms–disease network. Nature communications, 5

Page 23: Multilingual Ontology for Plant Health Threats Media Monitoring

Thank you very much for your attention

Questions?Roberto García

[email protected]://rhizomik.net/~roberto/