the european resources landscape

25
LKR2004, Tokyo M arch 8+9 2004 [email protected] 1 The European Resources Landscape Steven Krauwer ELSNET / Utrecht University The Netherlands

Upload: dorcas

Post on 25-Feb-2016

29 views

Category:

Documents


2 download

DESCRIPTION

The European Resources Landscape. Steven Krauwer ELSNET / Utrecht University The Netherlands. Overview. About ELSNET Main characteristics of the European scene Impact of EU funding policies Bottom-up resources infrastructure actions Concluding remarks. What is ELSNET. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 1

The European Resources Landscape

Steven KrauwerELSNET / Utrecht University

The Netherlands

Page 2: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 2

Overview

• About ELSNET• Main characteristics of the European scene• Impact of EU funding policies• Bottom-up resources infrastructure actions• Concluding remarks

Page 3: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 3

What is ELSNET• European Network in Human Language

Technologies (ca 145 academic and industrial member organisations)

• Funded by the European Commission• Created in 1991 as one network out of (eventually)

ca 25, covering all subfields of ICT• Objectives

– bringing together the language and speech communities– bringing together academia and industry– facilitating R&D in language and speech technology

• Info: [email protected] http://www.elsnet.org

Page 4: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 4

What we do• Spreading knowledge, e.g.:

– Training (e.g annual summer schools, curriculum development)

– Information dissemination (newsletter, website, etc)– Knowledge transfer (directories, workshops)

• Creating common foundations:– language resources– common standards and evaluation methods

• Roadmapping:– Establishing a broadly supported common vision of

where the language and speech field is going

Page 5: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 5

Main characteristics of the European Landscape

• Multilinguality: coping with many languages and crossing language boundaries

• Fragmentation of all R&D efforts over national funding schemes and policies

• Unbalanced efforts over languages, even though all languages are equally hard

Page 6: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 6

Languages in Europe

• European Union has – 15 member states, with 11 official languages (plus quite

a few ‘unofficial languages’)– 10 new member states with (at least) 10 new official

languages joining May 1st 2004– 3 applicant countries in the waiting room with at least 3

extra languages• Europe has

– 17 other countries, with quite a few additional languages (think of Russia!)

Page 7: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 7

Languages in the world

The Ethnologue (http://www.ethnologue.org):• Europe: 230 languages• The Americas: 1013 languages• The Pacific: 1311 languages• Africa: 2058 languages• Asia: 2197 languages

Page 8: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 8

Languages in Japan

• Just one language: Japanese ….• But even in Japan multilinguality is a factor, e.g:

– Export market requires localized products (e.g. user interfaces)

– Users require documentation in their own language– Business to business communication crosses language

boundaries– Immigrants

Page 9: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 9

Resources in Europe

• Language resources collection started in most countries as a cultural or political activity

• Most activities in larger countries with bigger funding programmes

• Adoption or creation of resources for industrial application started much later

• Most of them addressing commercially interesting languages

• Result: very uneven coverage

Page 10: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 10

Impact of the EU

• During 70s and 80s EU becomes a major funder of technology programmes

• For smaller languages EU becomes main funding source

• Political requirement of multinational consortia and balanced participation over member states gave strong boost to resources development for smaller languages

Page 11: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 11

Recent EU policies• EU focus shifting to activities with a more direct

commercial impact• EU focus shifting from spreading excellence to

boosting excellence: only invest in sectors where Europe can maintain or strengthen world leadership (over e.g. US and Japan)

• EU moves from many small projects (up to 5 million euro) to few big projects (up to 50 million)

• Language and speech technology have disappeared from the agenda, and Interfaces and Knowledge Systems have taken their place

Page 12: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 12

Result of new policies

• Strong emphasis on the commercially interesting languages

• Language and speech will only appear as embedded technologies

• Creation of language resources in EU projects only if needed for the main objectives of the project, i.e. never as a goal per se

• Fragmentation of language and speech technology activities over many projects

Page 13: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 13

Impact on infrastructures

• Creation and distribution of resources, standards, and evaluation are infrastructural in nature (as opposed to research and development)

• They require continuity and active industrial involvement

• Very hard to accomplish in EU funding context because of short duration of projects and requirement that industries contribute 50% of their costs themselves

• Resources actions now mostly at national level

Page 14: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 14

Overall picture …

• … not very good: very little to expect from EU as far as improvement of the language resources situation is concerned for the duration of the present Framework Programme (2003-2007)

• But there are some signs that the situation will improve in the next Framework Programme,

• And there are still a number of bottom up activities (emerging from the community, with or without EU support)

Page 15: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 15

Ongoing resources infrastructure actions

• ELSNET: still running (since 1991, hopefully secured until summer 2005; funded by the EU as a series of independent 2-3 year projects), still supporting resources and evaluation, now focusing on the roadmap for language and speech technology and for language and speech resources

• ELRA/ELDA: Resources Association and Agency; European counterpart (although not twin sister) of LDC

Page 16: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 16

Ongoing actions,continued

• ENABLER: – Network aiming at coordination of national

resources activities; EU funding has ended, but it remains active.

– Surveys and other useful material on website (www.enabler-network.org)

– Involved in resources roadmap and landscape (see later)

– Asian and US participation

Page 17: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 17

Cocosda

• International committee for the coordination and standardisation of speech databases and assessment techniques

• International, not just European – also active Asian involvement

• Not funded, but alive

Page 18: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 18

ICCWLRE

• International coordination committee for written language resources and evaluation.

• Written language counterpart of Cocosda• Goal is to join forces with Cocosda• To be launched at LREC 2004 in Lisbon• International, active Asian participation

Page 19: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 19

LREC

• Biannual international conference on resources and evaluation

• Initiated in 1998, very successful, and truly international

• Only conference on this topic and only conference bringing together language and speech communities

Page 20: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 20

Ongoing actions,continued

• The Language Resources Roadmap:– Joint activity of ELSNET/ENABLER/ELRA– Aimed at creating a broadly supported common

vision of where the field is going, and what the implications are for language resources

– Workshops (www.elsnet.org/roadmap.html)– Graphical representation at elsnet.dfki.de

Page 21: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 21

Ongoing actions,continued

• The Resources Landscape:– Joint project by ELSNET/ENABLER– Aimed at creation and continued maintenance

of a full landscape of the world of language resources (actors, actions, projects, events, resources, etc)

– Still under construction– See www.enabler-network.org

Page 22: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 22

EAGLES/ISLE/Wordnet

• EAGLES (and its successor ISLE) were EU funded projects aimed at standards in language and speech processing

• Projects have ended, but there are still some ongoing activities, such as MILE (the Multilingual ISLE Lexical entry)

• WordNet has had a number of European spin-offs, such as EuroWordNet, BalkaNet and local instantiations for other languages

Page 23: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 23

Ongoing actions: BLARK

• Define (in a language-independent way) the minimal set of language resources that is necessary to do any precompetitive R&D and education at all for a language (the Basic Language Resource Kit or BLARK)

• Determine for each language which components are already available (survey)

• Make for each language a priority plan to complete the BLARK (and to get funding)

Page 24: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 24

New initiatives• Proposal to create BLARKnet: rejected by EU

because language and speech are no core objectives• In France the successful launch of the new national

programme TechnoLangue, explicitly addressing resources and evaluation

• In Europe the initiative towards LangNet, a network aimed at coordination of national language and speech technology programmes (including resources and evaluation)

• Some of the new EU projects will address resources problems, but project info has not been released yet

Page 25: The European Resources Landscape

LKR2004, Tokyo March 8+9 2004

[email protected] 25

Concluding remarks• We have seen some problems that are inherent to

the situation in Europe and that will not go away: linguistic fragmentation and uneven balance in distribution of R&D efforts over languages

• We have seen self-imposed problems (EU funding schemes and policies); they may go away if and when the funders change their minds

• But we have also seen that there is still place for a variety of resources related initiatives in Europe, many of which could benefit from collaboration with e.g. Japan