using olif, the open lexicon interchange format susan mccormick olif2 consortium october 1, 2004

25
Using OLIF, Using OLIF, The Open Lexicon The Open Lexicon Interchange Format Interchange Format Susan McCormick Susan McCormick OLIF2 Consortium OLIF2 Consortium October 1, 2004 October 1, 2004

Upload: irea-walker

Post on 27-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

Using OLIF,Using OLIF,The Open Lexicon The Open Lexicon

Interchange FormatInterchange Format

Susan McCormickSusan McCormick

OLIF2 ConsortiumOLIF2 Consortium

October 1, 2004October 1, 2004

Page 2: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

The OLIF FormatThe OLIF Format

The Open Lexicon Interchange FormatThe Open Lexicon Interchange Format XML-compliant standard XML-compliant standard Supports exchange of lexical and Supports exchange of lexical and

terminological data for language terminological data for language technology applicationstechnology applications

Handles basic exchange as well as more Handles basic exchange as well as more complex applications such as MT lexiconscomplex applications such as MT lexicons

Page 3: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

The OLIF2 ConsortiumThe OLIF2 Consortium

OLIF v.2 was developed by the OLIF2 OLIF v.2 was developed by the OLIF2 Consortium, a group of language Consortium, a group of language technology companies and technology companies and organizations interested in issues of organizations interested in issues of MT data/term data exchangeMT data/term data exchange Led by Led by SAPSAP Members include Members include Xerox, Microsoft, Xerox, Microsoft,

Trados, IBM, Systran, IAI, DFKITrados, IBM, Systran, IAI, DFKI and and ComprendiumComprendium

Page 4: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

Developing OLIF v.2Developing OLIF v.2 Based on OLIF prototypeBased on OLIF prototype

Developed in EC-funded Developed in EC-funded OTELOOTELO project – project – proposing standards for users of proposing standards for users of disparate language toolsdisparate language tools

Original purpose of OLIF was to facilitate Original purpose of OLIF was to facilitate terminology exchange for industrial terminology exchange for industrial users of MTusers of MT

Page 5: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

Developing OLIF v.2Developing OLIF v.2 Version 2 adapted from OLIF Version 2 adapted from OLIF

prototype using input fromprototype using input from Developers/users of 3+ MT systemsDevelopers/users of 3+ MT systems Developers/users of terminology Developers/users of terminology

management systemsmanagement systems Other language standards projects:Other language standards projects:

EAGLESEAGLES SALTSALT ISLEISLE MARTIF, TBXMARTIF, TBX

Page 6: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

OLIF Version 2OLIF Version 2

Released as open standard in 2002Released as open standard in 2002 XML-compliantXML-compliant Covers 6 European languagesCovers 6 European languages

English, German, French, Spanish, English, German, French, Spanish, Danish, PortugueseDanish, Portuguese

Includes options for modeling Includes options for modeling administrative, morphological, administrative, morphological, syntactic and semantic datasyntactic and semantic data

Page 7: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

Available to UsersAvailable to Users

XML implementation of OLIF XML implementation of OLIF specification in a DTDspecification in a DTD

Available from OLIF2 Consortium web Available from OLIF2 Consortium web site:site:

www.olif.netwww.olif.net

Page 8: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

The OLIF FileThe OLIF File

Follows Terminology Markup Follows Terminology Markup Framework (TMF) structure:Framework (TMF) structure:

HeaderHeader BodyBody Shared resourcesShared resources

Page 9: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

The OLIF EntryThe OLIF Entry

Collection of monolingual data on a Collection of monolingual data on a specified sense of a word or phrasespecified sense of a word or phrase

Optional links for cross-reference and Optional links for cross-reference and transfertransfer

Transfer is bilingual and unidirectionalTransfer is bilingual and unidirectional Multiple transfers in multiple languages Multiple transfers in multiple languages

possible for single word sensepossible for single word sense

Page 10: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

Key Data CategoriesKey Data Categories

The OLIF entry is uniquely identified The OLIF entry is uniquely identified by 5 key data categories:by 5 key data categories: Canonical formCanonical form LanguageLanguage Part of speechPart of speech Subject fieldSubject field Semantic readingSemantic reading

Page 11: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

Basic Well-Formed OLIF Basic Well-Formed OLIF EntryEntry

<entry> <mono>

<keyDC>  <canForm>table</canForm>   <language>en</language>   <ptOfSpeech>noun</ptOfSpeech>   <subjField>general</subjField>   <semReading>86</semReading>   </keyDC>

</mono></entry>

Page 12: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

<entry><entry> <mono><mono> <keyDC><keyDC> <canForm>table</canForm>

  <language>en</language>   <ptOfSpeech>noun</ptOfSpeech>   <subjField>general</subjField>   <semReading>86</semReading> </keyDC></keyDC> <monoDC><monoDC>

  </monoDC></monoDC> </mono></mono></entry></entry>

<monoAdmin><monoAdmin>  <originator><originator>WeberWeber</originator> </originator>

<adminStatus><adminStatus>verver</adminStatus></adminStatus> </monoAdmin></monoAdmin>

<monoMorph><monoMorph> <inflection><inflection>like book,bookslike book,books</inflection> </inflection> </monoMorph></monoMorph> <monoSyn><monoSyn> <synType><synType>cntcnt</synType></synType> <synFrame><synFrame>[gencomp-opt][gencomp-opt]</synFrame> </synFrame> </monoSyn></monoSyn> <monoSem><monoSem> <semType><semType>informinform</semType></semType> </monoSem></monoSem>

Page 13: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

OLIF Entry with Cross-OLIF Entry with Cross-ReferenceReference

<entry><entry>

<mono><mono>

<keyDC><keyDC>

<canForm>table</canForm>   <language>en</language>   <ptOfSpeech>noun</ptOfSpeech>   <subjField>general</subjField>   <semReading>86</semReading>

</keyDC></keyDC> </mono></mono>

</entry></entry>

<crossRefer><crossRefer> <keyDC><keyDC>  <canForm><canForm>rowrow</canForm> </canForm>   <language><language>enen</language> </language>   <ptOfSpeech><ptOfSpeech>nounnoun</ptOfSpeech> </ptOfSpeech>   <subjField><subjField>generalgeneral</subjField> </subjField>   <semReading><semReading>6969</semReading> </semReading>   </keyDC></keyDC> <crLinkType><crLinkType>has-meronymhas-meronym</crLinkType</crLinkType>></crossRefer</crossRefer>>

Page 14: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

OLIF Entry with TransferOLIF Entry with Transfer<entry><entry>

<mono><mono>

<keyDC><keyDC>

<canForm>table</canForm>   <language>en</language>   <ptOfSpeech>noun</ptOfSpeech>   <subjField>general</subjField>   <semReading>86</semReading>

</keyDC></keyDC> </mono></mono>

</entry></entry>

<transfer><transfer> <keyDC><keyDC>  <canForm><canForm>TabelleTabelle</canForm> </canForm>   <language><language>dede</language> </language>   <ptOfSpeech><ptOfSpeech>nounnoun</ptOfSpeech> </ptOfSpeech>   <subjField><subjField>generalgeneral</subjField> </subjField>   <semReading><semReading>8686</semReading> </semReading>   </keyDC></keyDC></transfer</transfer>>

Page 15: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

Data Category ValuesData Category Values Allowed values specified by OLIFAllowed values specified by OLIF Administrative, terminological, linguistic Administrative, terminological, linguistic

values based on values based on General industry standardsGeneral industry standards

E.g., allowed values for E.g., allowed values for datedate derived from derived from recommendations from ISO 8601:1988recommendations from ISO 8601:1988

MT/Terminology standardsMT/Terminology standards E.g., suggested values for E.g., suggested values for subject fieldsubject field adapted from adapted from

ECEC Widely-recognized linguistic standardsWidely-recognized linguistic standards

E.g., allowed values for E.g., allowed values for gender gender based on based on longstanding gender description for European longstanding gender description for European languageslanguages

Page 16: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

User Extensions: User Extensions: The OLIF Data Category The OLIF Data Category

RegistryRegistry Users may declare and use their own Users may declare and use their own

values for certain data categories:values for certain data categories: Subject fieldSubject field Semantic readingSemantic reading Morphological structureMorphological structure Part of speechPart of speech InflectionInflection AspectAspect Syntactic typeSyntactic type Syntactic frameSyntactic frame Semantic typeSemantic type Concept hierarchyConcept hierarchy

Page 17: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

Organizing Based on Organizing Based on ConceptConcept

Users may link monolingual entries Users may link monolingual entries via a concept identifiervia a concept identifier

These IDs can be used to organize These IDs can be used to organize entries as equivalent word senses entries as equivalent word senses associated with the same concepts associated with the same concepts rather than source word senses rather than source word senses associated with transfers. associated with transfers.

Page 18: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

Entries Linked by ConceptEntries Linked by Concept<entry ConceptUserId=<entry ConceptUserId= ” ”0731F16CCCD2D3119B4D0731F16CCCD2D3119B4D”>”> <mono><mono>

<keyDC><keyDC>  <canForm><canForm>tabletable</canForm> </canForm>   <language><language>enen</language> </language>  

<ptOfSpeech><ptOfSpeech>nounnoun</ptOfSpeech> </ptOfSpeech>   <subjField><subjField>generalgeneral</subjField> </subjField>   <semReading><semReading>8686</semReading> </semReading>   </keyDC></keyDC>

</mono></mono></entry></entry>

<entry ConceptUserId=<entry ConceptUserId= ” ”0731F16CCCD2D3119B4D0731F16CCCD2D3119B4D”>”> <mono><mono>

<keyDC><keyDC>  <canForm><canForm>TabelleTabelle</canForm> </canForm>   <language><language>dede</language> </language>   <ptOfSpeech><ptOfSpeech>nounnoun</ptOfSpeech> </ptOfSpeech>   <subjField><subjField>generalgeneral</subjField> </subjField>   <semReading><semReading>8686</semReading> </semReading>   </keyDC></keyDC>

</mono></mono></entry></entry>

Page 19: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

What’s Available to the OLIF What’s Available to the OLIF User?User?

On On www.olif.netwww.olif.net Complete XML DTD for downloadComplete XML DTD for download Hyperlinked DTD for viewingHyperlinked DTD for viewing Graphical view of structure of DTDGraphical view of structure of DTD Current specification for OLIF v.2Current specification for OLIF v.2 Formalization of OLIF data categoriesFormalization of OLIF data categories Alphabetic list of XML elements and attributesAlphabetic list of XML elements and attributes Fixed and recommended values for elements Fixed and recommended values for elements

and attributesand attributes Guidelines for formulating canonical formsGuidelines for formulating canonical forms Sample OLIF entriesSample OLIF entries

Page 20: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004
Page 21: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

Using OLIFUsing OLIF

Some applications:Some applications: SAP has implemented an OLIF converter SAP has implemented an OLIF converter

to exchange terminological data from its to exchange terminological data from its central termbase SAPtermcentral termbase SAPterm

MT developers in OLIF2 Consortium MT developers in OLIF2 Consortium currently developing OLIF converters currently developing OLIF converters (Comprendium, Systran)(Comprendium, Systran)

OLIF User Forum = 60+ membersOLIF User Forum = 60+ members

Page 22: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

What’s New: XML SchemaWhat’s New: XML Schema

OLIF XSD offersOLIF XSD offers 40+ built-in data types40+ built-in data types Allows creation of user-defined data Allows creation of user-defined data

typestypes Supports inheritanceSupports inheritance

Page 23: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

What’s New: The OLIF APIWhat’s New: The OLIF API

Based on OLIF XSD, Java classes Based on OLIF XSD, Java classes createdcreated

Supports:Supports: Converting .csv files to OLIFConverting .csv files to OLIF Converting from XML format to OLIFConverting from XML format to OLIF Creating OLIF documents from scratchCreating OLIF documents from scratch Modifying OLIF documentsModifying OLIF documents

Page 24: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

What to Expect this Year from What to Expect this Year from OLIFOLIF

OLIF XSD and API are available to the OLIF XSD and API are available to the user from user from www.olif.netwww.olif.net

OLIF web site upgraded, updatedOLIF web site upgraded, updated Requirements for modeling Japanese Requirements for modeling Japanese

entries integratedentries integrated

Page 25: Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

OLIF User ForumOLIF User Forum

Users of OLIF can access and post Users of OLIF can access and post questions, messages and sample data questions, messages and sample data from the OLIF group site:from the OLIF group site:

http://groups.yahoo.com/group/http://groups.yahoo.com/group/olifConsortium/olifConsortium/