a multilingual semantic wiki based on attempto controlled english and grammatical framework

32
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework Tobias Kuhn Chair of Sociology, in particular of Modeling and Simulation, ETH Zurich, Switzerland Computational Linguistics Colloquium, University of Zurich 29 October 2013

Upload: tobias-kuhn

Post on 10-May-2015

490 views

Category:

Business


0 download

TRANSCRIPT

Page 1: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

A Multilingual Semantic Wiki based onAttempto Controlled English and

Grammatical Framework

Tobias Kuhn

Chair of Sociology, in particular of Modeling and Simulation, ETH Zurich,Switzerland

Computational Linguistics Colloquium, University of Zurich29 October 2013

Page 2: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

About This Talk

This talk is mainly based on the following papers:

Kaarel Kaljurand and Tobias Kuhn. A Multilingual Semantic WikiBased on Attempto Controlled English and Grammatical Framework.In Proceedings of the 10th Extended Semantic Web Conference(ESWC). 2013.http://purl.org/tkuhn/eswc2013acewikigf

Kaarel Kaljurand, Tobias Kuhn, and Laura Canedo. Collaborativemultilingual knowledge management based on controlled naturallanguage. Under review.http://www.semantic-web-journal.net/system/files/swj524.pdf

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 2 / 32

Page 3: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Imagine ...

... that Wikipedia can check consistency and answerquestions about the contained knowledge, and

... that all content is instantly available in alllanguages!

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 3 / 32

Page 4: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

• AceWiki is a semantic wiki

• Articles are written in Attempto Controlled English (ACE)

• These sentences are internally translated into the Semantic Weblanguage OWL

• An OWL reasoner is built in to answer questions and detectinconsistencies

• Special editor for writing ACE statements

• Has been extended to support multilinguality

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 4 / 32

Page 5: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Monolingual AceWiki: Screenshot

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 5 / 32

Page 6: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Attempto Controlled English (ACE)

Subset of natural English:

• Conjunction, disjunction, negation, if-then, ...

• Anaphoric references: pronouns, definite noun phrases, variables

• Quantifiers: every, no, at least 3, ...

• Content words: proper names, nouns, verbs, adjectives, ...

Grammar is fixed, but users can change content words.

Deterministic ambiguity handling:

• Anaphora resolution (France borders Spain and it bordersPortugal.)

• Quantifier scope (Every country borders a country.)

• Attachment (Every EU-country borders a country that is anEU-country and is a NATO-country.)

Well-defined translations to and from first-order logic, OWL, ...

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 6 / 32

Page 7: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Predictive Editor

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 7 / 32

Page 8: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Consistency Checking

AceWiki ensures consistency by checking every new statement:

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 8 / 32

Page 9: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Question Answering

AceWiki supports simple wh-questions:

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 9 / 32

Page 10: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Monolingual AceWiki: Demo

http://attempto.ifi.uzh.ch/acewiki/

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 10 / 32

Page 11: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

ACE Reasoning via Translation to OWL

Every country that does not border a sea is a landlocked-country.

SubClassOf(

ObjectIntersectionOf(

:country

ObjectComplementOf(

ObjectSomeValuesFrom(

:border

:sea

)

)

)

:landlocked-country

)

Which country is a landlocked-country?

ObjectIntersectionOf(

:country

:landlocked-country

)

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 11 / 32

Page 12: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Evaluation

Two small usability experiments with earlier versions of AceWiki:

• Altogether 26 untrained participants

• Task: Collaborative creation of a knowledge base

Results:

• 78%-81% of the sentences were correct and sensible

• 61%-70% of them were complex (containing negations,implications, disjunctions or number restrictions)

• Creation of a correct sentence every 5–6 minutes

• Definition of a new word every 5–7 minutes

→ Even untrained users can effectively use AceWiki

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 12 / 32

Page 13: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Multilingual AceWiki: AceWiki-GF

General ideas:

• Make wiki content available in different languages

• Automatically translated content using rule-based machinetranslation: Grammatical Framework (GF)

• Language switching like in Wikipedia

• Localization of the user interface

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 13 / 32

Page 14: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Grammatical Framework (GF)

GF is a framework for multilingual grammar engineering:

• Rule-based

• Functional programming language (based on Haskell) optimizedto handle natural language

• Resource Grammar Library implementing common morphologicaland syntactic structures

• Mildly context sensitive

• Bidirectional translations: concrete language ⇔ abstract syntax

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 14 / 32

Page 15: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

GF grammars and translations

GF grammars consist of:

• One language-neutral abstract syntax

• Concrete syntaxes specify words, agreement, word order, etc. byimplementing the abstract categories and functions

Example

border : Country -> Country -> Relation

English: border x y = x!Nom + "borders" + y!Nom

Estonian: border x y = x!Gen + "naaber on" + y!Nom

GF translations consist of:

• First, parse a string in the original language to a tree (or trees)in the abstract syntax

• Then, linearize these trees as strings in the target language

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 15 / 32

Page 16: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Multilingual AceWiki: Screenshot

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 16 / 32

Page 17: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

GF Resource Grammar Library (RGL)

• Morphology and syntax for ∼30 languages via language-neutralAPI

• Developers do not need detailed knowledge of the languagesthat they want to support in their application

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 17 / 32

Page 18: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Implementation of AceWiki-GF

Integration of ACE with GF (ACE-in-GF):

• Implementation of a multilingual grammar of ACE in the GFframework

• Coverage of the languages supported by the GF resourcegrammar

• No fine-tuning to any particular language (apart from ACE)

Integration of AceWiki with GF (AceWiki-GF):

• Implementation of connections to GF tools (GF Webservice /Cloud Service)

• Support for the management of multilinguality, ambiguity, andgrammar/lexicon editing

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 18 / 32

Page 19: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Multilingual AceWiki: Demo

http://attempto.ifi.uzh.ch/acewiki-gf/

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 19 / 32

Page 20: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

ACE-in-GF

• Multiple controlled versions of natural languages that map toACE (and to each other)

• As a result, they can be bidirectionally mapped to various formallanguages already supported by ACE

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 20 / 32

Page 21: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

ACE-in-GF: Example

German: Jedes Land, das nicht an ein Meer grenzt, ist einBinnenland.

ACE-in-GF tree:baseText (sText (s (vpS (everyNP (relCN (cn_as_VarCN country_CN)

(neg_predRS which_RP (v2VP border_V2 (thereNP_as_NP

(aNP (cn_as_VarCN sea_CN))))))) (npVP (thereNP_as_NP

(aNP (cn_as_VarCN landlocked_country_CN)))))))

ACE: Every country that does not border a sea is alandlocked-country.

OWL:SubClassOf(

ObjectIntersectionOf(

:country

ObjectComplementOf(

ObjectSomeValuesFrom( :border :sea )

)

)

:landlocked-country

)Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 21 / 32

Page 22: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

ACE-in-GF: Implementation

Implementation of the ACE syntax:

• Targeting the subset of ACE that can be mapped to OWL

• Almost 100% coverage at almost 0% ambiguity

Support of most RGL languages:

• Bulgarian, Catalan, Chinese, Danish, Dutch, English, Finnish,French, German, Greek, Hindi, Italian, Latvian, Norwegian,Polish, Romanian, Russian, Spanish, Swedish, Thai, Urdu

• RGL-based design provides automatic increase in quality andlanguage-coverage over time

Status

• Some precision problems, e.g. with anaphoric references

• Ambiguity and coverage problems in some languages

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 22 / 32

Page 23: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Ambiguity Resolution

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 23 / 32

Page 24: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Evaluation of ACE-in-GF

Design

• Generation of ∼100 ACE sentences/questions and automatictranslation to all supported languages

• Full coverage of grammar functions• Large coverage of OWL axiom structures (subclass, range,

domain, transitivity, ...)

• Measuring translation accuracy from ACE to other languages

• Using Google Translate as the baseline

• 20 human evaluators (2 per language) as the gold standard

Results

• Participants preferred ACE-in-GF translations to Googletranslations and post-edited them less

• Many edits were stylistic

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 24 / 32

Page 25: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Evaluation of ACE-in-GF: Results

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 25 / 32

Page 26: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Evaluation of AceWiki-GF

Hypothesis: A group of users reaches almost the same level of agreement

on the content of an article presented to them in different languages as

when the article is presented to all of them in the same language.

Design

• Based on a 500-word lexicon on European geography in threelanguages: English, German and Spanish

• 30 participants accessed AceWiki-GF and wrote sentences intheir language (10 participants for each language)

• They had to enter true and false sentences and tag them as such

• In a post-editing task, each participant checked the output oftwo other participants: one translated from another languageand one written in the same language (true/false tags wereremoved and sentences shuffled); they were asked to remove allfalse sentences

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 26 / 32

Page 27: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Evaluation of AceWiki-GF: Results

30 participants spent on average 37 minutes using AceWiki-GF,creating 316 sentences in total.

Definition of agreement level: (Tk + Fd)/SS is the total number of sentences, Tk the number of sentences marked as true

and kept, and Fd the ones marked as false and deleted

Agreement level (difference is not significant):

82.2%without translation

84.0%with translation

0% 25% 50% 75% 100%agreement level

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 27 / 32

Page 28: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Evaluation of AceWiki-GF: Results

Assumption: translation introduces a constant translation error rate rthat has the effect that the agreement level is (1− r)× a instead of a

New hypothesis: The translation error rate is less than 5%.

78.1%with hypothetical translation (r = 5%)

84.0%with translation

0% 25% 50% 75% 100%agreement level

p-value with one-tailed Wilcoxon signed-rank test: 0.046

→ With AceWiki-GF, translation error rate is less than 5%

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 28 / 32

Page 29: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Evaluation of AceWiki-GF: Feedback

Questionnaire for the participants contained these questions:

1 Was AceWiki Geography easy or difficult to use in general?

2 Was the sentence editor easy or difficult to use?

3 Was creating true and false statements easy or difficult toperform?

Possible answers: “very difficult” (0), “difficult” (1), “medium” (2),“easy” (3), and “very easy” (4)

Results:

1 Average: 2.93 (∼“easy”)

2 Average: 2.77 (∼“easy”)

3 Average: 2.70 (∼“easy”)

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 29 / 32

Page 30: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

The Future...?

Can we make a truly multilingual Wikipedia?

• Store main content in a semantic representation

• Verbalization in different languages

• All content is instantly available in all languages (once therequired vocabulary is defined)

• Breaking the current dominance of English and putting an endto the lock-out of users speaking less widespread orunderrepresented languages

• Contributing to the Semantic Web

Related:

• http://www.wikidata.org

• http://meta.wikimedia.org/wiki/A_proposal_towards_a_

multilingual_Wikipedia

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 30 / 32

Page 31: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Links

ACE parser (APE) source code: https://github.com/Attempto/APE

ACE-in-GF source code: http://github.com/Attempto/ACE-in-GF

AceWiki and AceWikiGF

• Source code: http://github.com/AceWiki/AceWiki

• Demos (non-GF): http://attempto.ifi.uzh.ch/acewiki/

• Demos (GF): http://attempto.ifi.uzh.ch/acewiki-gf/

MOLTO project web site: http://www.molto-project.eu

Attempto web site: http://attempto.ifi.uzh.ch

GF: http://www.grammaticalframework.org

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 31 / 32

Page 32: A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework

Thank you for your Attention!

Questions?

Tobias Kuhn, ETH Zurich A Multilingual Semantic Wiki 32 / 32