MESMUSES broad vision
Just like several other projects SW is all about semantic interoperability
Sharing machine-readable terminologies and classification schemes
Science and culture are collective and international
Semantic Web methodology should be highly relevant for managing and sharing scientific and cultural information
Some key S&T issues in the Project
Model : is RDFS / OWL-Lite adequate ?
Schema authoring : method and tools needed !
Metadata : where does it come from ?
Automatic Indexing : experiments with a categorizer
The basic SW model
Dwelling Person Artefact
House Artist Artwork
Lives-in
Owner
Produces
Create
Type : texte imprimé, monographie
Auteur(s) : Zola, Émile (1840-1902)
Titre(s) : L'assommoir [Texte imprimé] / par Emile Zola
Edition : 50e éd.
Publication : Paris : G. Charpentier, 1878
Description matérielle : 111-569 p.
Notice n° : FRBNF35963044
CreatesLives-in Surrogates
Schema
Real-worldentities
Model and Schema Language
Typed attributes are needed XML-Schema types Derived types (e.g.: Celsius temperature,
Gregorian date, etc.) Enumerated types, thesauri
Time-stamping Cardinality constraints Explicit transitivity of properties (e.g.:
geographic inclusion)
Schema authoring issues (1)
Find the right level of abstraction Is « Glucid » a class or an instance ? Or is it sometime a class and sometime an
instance ?
Avoid the « KR » attitude and practices ! It’s all about indexing resources with shared
terminologies, not about representing human knowledge !
Schema authoring issues (2)
est-régulé-par
est-expliquée-par
Processus
Processusélémentaire
Processuscomplexe
est-réalisé-par
nécessite
déclenche
Structure
Cellule
Molécule
Organisme
Appareil
Organe
Tissus
Système
GTANSGrande Thématique
est-documentée-par
est-documentée-par
est-constitué-de
consomme
transforme
produit
implique
est-constitué-de
élimine
ISAISA
ISA
Schema authoring issues (4)
Authoring tools are badly needed Graphical representation of the schema Zooming on sub-graphs (hierarchies) Versioning
Consider using UML authoring environment ?
Established methodology and tutorials are needed
Creating Surrogates
Data extraction and fusion from structured sources
R-DB, XML-DB, LDAP Updating
When ? Should not create duplicates !
Detect cross-references Authority lists Thesauri Lexical distance ???
Automatic Categorization
Automatic indexing By extracting metadata from resources By automatic categorization
Define hierarchies of « concepts » inside the schema
Seeding with representative documents Machine learning to create categorizers
Pros : enriched search functionality Cons : hierarchies of categories are static
Adding a category may change the categorizers of the others
Bottom-line…
RDFS schema authoring may be more difficult than E-R modelling
Debates on syntactic features are irrelevant Should be grounded on real-world implementations
and testbeds
A new query language (e.g.: RQL) is not high priority
We have not addressed the « logical rules » layer
Semantic Web vs. Community Webs