knowledge issues for automatic identification of co 2 storage sites by means of semantic web...

43
Knowledge issues for automatic Knowledge issues for automatic identification of CO identification of CO 2 2 storage storage sites by means of Semantic Web sites by means of Semantic Web Technology Technology Michel Perrin (ENSMP), Priscille Durville (INRIA), Sandrine Grataloup (BRGM), Laura Mastella (ENSMP), Julie Lions (BRGM), Olivier Morel (BRGM), Jean-François Rainaud (IFP)

Upload: denis-fletcher

Post on 12-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Knowledge issues for automatic Knowledge issues for automatic identification of COidentification of CO2 2 storage storage

sites by means of Semantic Web sites by means of Semantic Web TechnologyTechnology

Michel Perrin (ENSMP), Priscille Durville (INRIA), Sandrine Grataloup (BRGM), Laura Mastella (ENSMP), Julie Lions (BRGM), Olivier Morel (BRGM), Jean-François Rainaud (IFP)

Goal of the studyGoal of the study

Propose a new methodology for identifying Propose a new methodology for identifying and exploiting through internet various and exploiting through internet various documentary sources, that may be documentary sources, that may be relevant for COrelevant for CO22 studies studies (site selection, site (site selection, site assessment…)assessment…)

SummarySummary

• Considered issuesConsidered issues (e_Wok Hub project)(e_Wok Hub project)

• Proposed methodologyProposed methodology

• Geographical vocabulary and geographical localizationGeographical vocabulary and geographical localization

• Ontologies for geological vocabularyOntologies for geological vocabulary

• Ontologies for geological agesOntologies for geological ages

• Other geological ontologiesOther geological ontologies

• How is all this going to work ? How is all this going to work ?

• ConclusionConclusion

Considered issuesConsidered issues

• Automatic management of documentary resourcesdocumentary resources of various types: paper documentspaper documents, seismic or drilling dataseismic or drilling data, geological maps or cross-sectionsgeological maps or cross-sections

• Producing a system able to answer practical questions practical questions about COabout CO22 storage sites storage sites

• The answer of the system should be a list of relevant list of relevant documentsdocuments for facilitating the technical work on site selection or assessment

Example:

Which diageneses have affected the Bathonian formations from the Paris basin ?

Proposed solution Proposed solution • The proposed solution consists in a software platform allowing various types of web servicesweb services to intercommunicate and cooperate for answering user requestsanswering user requests

• This solution is being studied within a French national research project (ANR project e_Wok HubANR project e_Wok Hub)

• Only knowledge issuesknowledge issues will be examined here: which knowledge should be considered ? how this knowledge will be retrieved and exchanged within the system ?

Proposed methodology for Proposed methodology for

knowledge identification and storageknowledge identification and storage

Knowledge identification and storage Knowledge identification and storage within the e_Wok Platformwithin the e_Wok Platform

Associate semantic annotationsemantic annotation to each document

11

by extracting a well defined and pertinent vocabulary 12

resting on adequate ontologiesontologies

13

Knowledge identification and storage Knowledge identification and storage within the e_Wok Platformwithin the e_Wok Platform

By means of

dedicated softwarededicated software, vocabulary is

extracted either

from texts

(or possibly, from

maps or images)

14

15

Knowledge identification and storage Knowledge identification and storage within the e_Wok Platformwithin the e_Wok Platform

Geographic Geographic

annotation annotation of documents can be operated:

by using a geographic

ontology

(INSEE data)

by spatial area definition

16

17

Knowledge identificationKnowledge identification

• Vocabulary identificationVocabulary identification was operated starting from a set of reference documentsdocuments (research papers, public reports) refering to a limited geographical zone (PICOREF zone, Paris Basin)

• Vocabulary was extracted from texts either automaticallyautomatically by means of dedicated softwarededicated software (FAST-R, ACABIT)

or « manually »« manually » by experts (at the initial stage)

• Preliminary results were also obtained from extracting

geographical vocabularygeographical vocabulary from mapsmaps or imagesimages

Automatic vocabulary extractionAutomatic vocabulary extraction

ABSTRACT:ABSTRACT: High-resolution sequence stratigraphy of the Keuper, Paris Basin, is used to establish correlations between the basin- centre evaporite series and the basin-margin elastics series. The high-resolution correlations show stratigraphic cycle geometries. The

Keuper consists of five minor base-level cycles which occur in the upper portion of the Scythian-Camian major base-level cycle and the lower part of the Camian-Liassic major

base-level cycle.

used establish show consists occur

Signifiant vocabularySignifiant vocabulary Non significant vocabulary Non significant vocabulary

high-resolution sequencebasin-centre evaporite evaporite series basin-margin elastic high-resolution correlation stratigraphic cycle cycle geometries etc.

Text: article Sylvie Bourquin, Francois Guillocheau , ( 1996 ) Keuper stratigraphic cycles in the Paris Basin

and comparison with cycles in other Peritethyan basins ( German Basin and Bresse-Jura Basin )

Knowledge identificationKnowledge identification

Ontologies and Semantic annotationOntologies and Semantic annotation

An ontology ontology is a formal description of the knowledge concerning a domain.It usually comprises :

ConceptsConcepts (what is the definition of “Fault”?) AttributesAttributes (“Fault” hasDisplacement of 10 miles) RelationshipsRelationships (“Horizon” isIntersectedBy “Fault”)

• Semantic AnnotationSemantic Annotation is a way of describing the meaning of resources by means of ontologies

Geographical vocabulary and Geographical vocabulary and geographical localizationgeographical localization

Example of extracted geographical vocabularyExample of extracted geographical vocabulary

Vocabulary Category Alpes Mountain chain French South-East Basin Non administrative geographic entity Bourgogne French administrative entity (“region”) Chaunoy French administrative entity

(“commune”) Germanic sea Palaegeographic entity (a mesoszoic sea) Loire estuary Non administrative geographic entity Méditerranean A sea Pays de Caux Non administrative geographic entity Dijon region Non administrative geographic entity (fuzzily defined around a city)

Annotation of geographical termsAnnotation of geographical terms

• an ontology of the French administrative divisionsontology of the French administrative divisions

provided by INSEE (French Official Geographic Code),

• an interactive tool allowing polygonal area definitionpolygonal area definition ,

which must be used for annotating geographical entities,

which do not have an administrative definition

(ex: Alpes, Loire estuary, Germanic sea)(ex: Alpes, Loire estuary, Germanic sea)

It rests on the following tools:

Ontologies for Ontologies for geological vocabularygeological vocabulary

• The « manual vocabulary extraction » operated by experts on the reference documents has enabled us to define the following categories of geological termscategories of geological terms, which are relevant in our case:

Geological age Earth state and paleogeography concepts, Basic geology (units and boundaries), Geological processes, Lithology and mineralogy, hydrogeology and reservoirs

• These categories do not totally fitdo not totally fit withwith the knowledge models presently available for geology (NADM, Geoscience NADM, Geoscience MLML)

• We have thus defined specific domain ontologiesspecific domain ontologies well focused on our needs.

Overview of the e_Wok hub ontologiesOverview of the e_Wok hub ontologies

Geological DatingGeological Dating(Geological time model)(Geological time model)

Geological Time Scale Ontology

Geological Time Scale OntologyGeological Time Scale Ontology

Geological process ontologyGeological process ontology

Matter Transformation Process Cement Dolomitisation Dedolomitisation Diagenesis Hydrothermal Metamorphic Pedogenesis Porogenesis Poronecrosis Precipitation Recrystallisation

Mineralogical Transformation

Reservoir Reservoir

SubstanceSubstance

How is all this How is all this going to work ? going to work ?

Dealing with the question:

Which diageneses have affected the Bathonian Which diageneses have affected the Bathonian formations from the Paris basin ?formations from the Paris basin ?

• DiagenesisDiagenesis is a concept described in the EWOKEWOK ontology ontology

for geological processfor geological process. So the documents dealing with

diagenesis are referenced as such in the EWOK databaseEWOK database.

Their reference can be gathered by the EWOK system in

REFERENCE SET 1REFERENCE SET 1

Dealing with the question:

Which diagenesis has affected the Bathonian Which diagenesis has affected the Bathonian formations from the Paris basin ?formations from the Paris basin ?

• BathonianBathonian is a geological period registered in the

International Geological Time Scale and in the EWOKEWOK

geological time scalegeological time scale ontology ontology . Thanks to semantic

annotation, the EWOK sytem can retrieve references

corresponding to documents annotated with the concept

BathonianBathonian or with related concepts of higher rank such as

Jurassic, Secondary, MesozoicJurassic, Secondary, Mesozoic.

•These references will be gathered in REFERENCE SET 2aREFERENCE SET 2a

Dealing with the question:

Which diagenesis has affected the Bathonian Which diagenesis has affected the Bathonian formations from the Paris basin ?formations from the Paris basin ?

• Another possible option is to extract from the question rather

than the word Bathonian, the expression Bathonian Bathonian

formationformation.

Dealing with the question:

Which diagenesis has affected the Bathonian Which diagenesis has affected the Bathonian formations from the Paris basin ?formations from the Paris basin ?

• This expression links two concepts respectively described in the

GeologicalTimeGeologicalTime and in the GeologicalUnitGeologicalUnit ontologies. They are

linked together by means of the ontology for Basic GeologyBasic Geology .

• It is thus possible to annotate documents containing the

expression Bathonian formation Bathonian formation .

Dealing with the question:

Which diagenesis has affected the Bathonian Which diagenesis has affected the Bathonian formations from the Paris basin ?formations from the Paris basin ?

• In this case, the system will retrieve references

corresponding to documents annotated with the expression

Bathonian formation Bathonian formation or with any other term that will have

been stored as an instance of this expression (for instance

ComblanchienComblanchien, which corresponds to a Bathonian formation

of the Burgundy region)

• These references will be gathered in REFERENCE SET 2bREFERENCE SET 2b

Dealing with the question:

Which diageneses have affected the Bathonian Which diageneses have affected the Bathonian formations from the Paris basin ?formations from the Paris basin ?

• Paris basinParis basin is a non administrative geographic term, whose

synomyms bassin de Paris, bassin parisienbassin de Paris, bassin parisien and which can be

described by a polygon. The EWOK system can also identify the

various administrative divisions lying inside this polygon. It can

thus retrieve references documents annotated with the terms

Paris basin,Paris basin, bassin de Paris, bassin parisienbassin de Paris, bassin parisien but also with

terms like ile de Franceile de France or département du Loiretdépartement du Loiret (or many

others corresponding to administrative division with the Paris

basin). The corresponding references will be gathered in

REFERENCE SET 3REFERENCE SET 3

Dealing with the question:

Which diageneses have affected the Bathonian Which diageneses have affected the Bathonian formations from the Paris basin ?formations from the Paris basin ?

The answer to the question will be a set of references SScorresponding to the intersection of REFERENCE REFERENCE SETS 1, 2 and 3SETS 1, 2 and 3.

S = S1 (S2a or S2b) S3

ConclusionConclusion

Advancement of the e_Wok Hub projectAdvancement of the e_Wok Hub project

• e_Wok Hube_Wok Hub is a 3 year project that is entering in its last yearlast year

• Domain ontologies Domain ontologies have already been defined (1st version)

• The global architecture of the systemglobal architecture of the system has also been

defined

There remains :

• toto design user interfacesuser interfaces

• to finalize a demonstratordemonstrator corresponding to a significant use use

casecase to be shown at the end of the project.

• The e_Wok systeme_Wok system is a possible solution for enabling users to identify and retrieve adequate documentation through internet, in order to solve practical issues such as identifyingidentifying potential CO2 storage sitespotential CO2 storage sites.

• The system aims at putting in correspondence semantic contentssemantic contents respectively related to questions asked by users and to various types of documents. It relies on various intercommunicating and cooperating web web servicesservices.

• Specific goal-oriented ontologiesontologies have been developed for formalizing the geological and geographical vocabulary that must be considered in the case of CO2 storage issues. They will be used for complementing documents searched on internet by semantic annotationssemantic annotations for allowing their identification, their storage in the system database and their later retrieval.

• Compared with other search methodologies, our approach has the advantage of being goal-oriented goal-oriented and of allowing largely automated automated document searchdocument search.

Thank you !Thank you !

Questions ?Questions ?

Diapos supplémentairesDiapos supplémentaires

DATADATAEXTRACTIONEXTRACTION

KNOWLEDGE KNOWLEDGE EXTRACTIONEXTRACTION

KNOWLEDGEKNOWLEDGECOMPLETIONCOMPLETION

UPDATING &UPDATING &PERSISTENCEPERSISTENCE

ModelsModels(Meta data (Meta data

& & data files)data files)

Data Data BasesBases

UpdatedUpdatedrepresentationsrepresentations

UpdatedUpdatedData BasesData Bases

ReportsReports

Data baseData base

KnowledgeKnowledge basebase

PreviousPreviousInformation Information

SystemSystem

UpdatedUpdatedInformation Information

SystemSystem

KNOWLEDGE ADDITION

KNOWLEDGE EXTRACTION

KNOWLEDGE EXPLOITATION

IN APPLICATIONS

DATADATAEXTRACTIONEXTRACTION

KNOWLEDGE KNOWLEDGE EXTRACTIONEXTRACTION

KNOWLEDGEKNOWLEDGECOMPLETIONCOMPLETION

UPDATING &UPDATING &PERSISTENCEPERSISTENCE

ModelsModels(Meta data (Meta data

& & data files)data files)

ModelsModels(Meta data (Meta data

& & data files)data files)

Data Data BasesBasesData Data BasesBases

UpdatedUpdatedrepresentationsrepresentations

UpdatedUpdatedrepresentationsrepresentations

UpdatedUpdatedData BasesData Bases

UpdatedUpdatedData BasesData Bases

ReportsReports

Data baseData base

KnowledgeKnowledge basebase

PreviousPreviousInformation Information

SystemSystem

UpdatedUpdatedInformation Information

SystemSystem

KNOWLEDGE ADDITION

KNOWLEDGE EXTRACTION

KNOWLEDGE EXPLOITATION

IN APPLICATIONS

Geological unitGeological unit

Geological boundary Geological boundary