the cesar project: challenges and achievements

38
Co-funded by the 7th Framework Programme of the European Commission through the contract T4ME, grant agreement no.: 249119. Co-funded by the ICT PSP Programme of the European Commission through the contract CESAR, grant agreement no.: 271022. The CESAR Project: Challenges and Achievements Tamás Váradi coordinator Research Institute for Linguistics, Hungarian Academy of Sciences Budapest, Hungary [email protected] CESAR META-NET Roadshow Budapest, 18th January, 2013

Upload: ossie

Post on 14-Jan-2016

33 views

Category:

Documents


3 download

DESCRIPTION

The CESAR Project: Challenges and Achievements. Tamás Váradi coordinator Research Institute for Linguistics, Hungarian Academy of Sciences Budapest, Hungary [email protected] CESAR META-NET Roadshow Budapest, 18th January, 2013. Outline. The CESAR consortium P roject o bjectives - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The CESAR Project:  Challenges and Achievements

Co-funded by the 7th Framework Programme of the European Commission through the contract T4ME, grant agreement no.: 249119.

Co-funded by the ICT PSP Programme of the European Commission through the contract CESAR, grant agreement no.: 271022.

The CESAR Project: Challenges and Achievements

Tamás Váradicoordinator

Research Institute for Linguistics, Hungarian Academy of Sciences Budapest, Hungary

[email protected]

CESAR META-NET RoadshowBudapest, 18th January, 2013

Page 2: The CESAR Project:  Challenges and Achievements

Outline

The CESAR consortium

Project objectives

CESAR in META-SHARE

Survey of results

Gaps and Challenges

Conclusions

http://www.cesar-project.net

2

Page 3: The CESAR Project:  Challenges and Achievements

META-NET & CESAR

http://www.cesar-project.net

3

Page 4: The CESAR Project:  Challenges and Achievements

Geo-linguistic position

CESAR stands for CEntral and Southeast EuropeAn Resources

operates as integral part of META-NET

geo-linguistic spread Central and Southeast Europe three inner seas: Baltic, Adriatic, Black Sea

CESAR covers languages Polish EU, 38M (40-48M) Slovak EU, 5.4M (7M) Hungarian EU, 10M (16M) Croatian EU in 2013, 4.4M (5.5M) Serbian candidate soon, 7.3M (9M) Bulgarian EU, 7.5M (9M)

all languages Slavic, except Hungarian

4http://www.cesar-project.net

Page 5: The CESAR Project:  Challenges and Achievements

Who is CESAR?

Participant no.

Participant organisation name Participant short name

Country

1 (CO) Nyelvtudományi Intézet, Magyar Tudományos Akadémia HASRIL Hungary

2 Budapesti Műszaki és Gazdaságtudományi Egyetem BME-TMIT Hungary

3 Sveučilište u Zagrebu, Filozofski Fakultet – University of Zagreb, Faculty of Humanities and Social Sciences

FFZG Croatia

4 Instytut Podstaw Informatyki Polskej Akademii Nauk IPIPAN Poland

5 Uniwersytet Lodzki Ulodz Poland

6 Faculty of Mathematics, University of Belgrade UBG Serbia

7 Institut Mihajlo Pupin IPUP Serbia

8 The Institute for Bulgarian Language Prof. Lyubomir Andreychin IBL Bulgaria

9 Jazykovedny Ústav Ludovíta Stúra Slovenskej Akadémie Vied LSIL Slovakia

http://www.cesar-project.net

5

Page 6: The CESAR Project:  Challenges and Achievements

The Faces behind CESAR

http://www.cesar-project.net

6

Page 7: The CESAR Project:  Challenges and Achievements

Project objectives

provide a description of the national landscape in terms of language use, language-savvy products and services, language technologies

and resourcesc

ontribute to a pan-European digital language resources exchange(META-SHARE) enhance, extend, document, standardize, cross-link, cross-align resources

and toolsm

obilise national and regional stakeholders, public bodies and fundingr

einvigorate cooperation between key technology partners in the regionc

ollaborate with other partner projectsb

ridge the technological gap between this region and the other parts of Europe by 7

http://www.cesar-project.net

Page 8: The CESAR Project:  Challenges and Achievements

Timeline

Project runs between 1st February 2011 and 31st January 2013

Three major deliverables of resources and tools

BATCH 1: M10, 30th November 2011

BATCH2: M18, 31st July 2012

BATCH3: M24 31st January 2013

http://www.cesar-project.net

8

Page 9: The CESAR Project:  Challenges and Achievements

Where to find CESAR

www.meta-net.eu

http://www.cesar-project.net

9

Page 10: The CESAR Project:  Challenges and Achievements

www.cesar-project.net

http://www.cesar-project.net

10

Page 11: The CESAR Project:  Challenges and Achievements

CESAR in META-SHARE

http://www.cesar-project.net

11

Page 12: The CESAR Project:  Challenges and Achievements

www.meta-share.org

http://www.cesar-project.net

12

Page 13: The CESAR Project:  Challenges and Achievements

www.cesar-project.net/metashare

http://www.cesar-project.net

13

Page 14: The CESAR Project:  Challenges and Achievements

http://www.cesar-project.net

14

Page 15: The CESAR Project:  Challenges and Achievements

http://www.cesar-project.net

15

Page 16: The CESAR Project:  Challenges and Achievements

http://www.cesar-project.net

16

Page 17: The CESAR Project:  Challenges and Achievements

http://www.cesar-project.net

17

Page 18: The CESAR Project:  Challenges and Achievements

Results – M24

http://www.cesar-project.net

18

Page 19: The CESAR Project:  Challenges and Achievements

CESAR First Batch of Resources

http://www.cesar-project.net

19

Statistics of resources:

Page 20: The CESAR Project:  Challenges and Achievements

CESAR Second Batch of Resources

http://www.cesar-project.net

20

Statistics of resources:

Page 21: The CESAR Project:  Challenges and Achievements

CESAR Third Batch of Resources

http://www.cesar-project.net

21

Statistics of resources available for 3rd batch:

Page 22: The CESAR Project:  Challenges and Achievements

Total resources

http://www.cesar-project.net

22

Page 23: The CESAR Project:  Challenges and Achievements

‘In other words – 1st and 2nd batch’

Quick statistics of already submitted LRs:

monolingual corpus (token) = 1 702 565 806

paralel corpus (token) = 41 810 000

record/entry/lexicon = 1 640 579

divided between 32 corpora 12 lexical resources 20 tools/services

http://www.cesar-project.net

23

Page 24: The CESAR Project:  Challenges and Achievements

Distribution of META-SHARELicence types

http://www.cesar-project.net

24

Page 25: The CESAR Project:  Challenges and Achievements

Hungarian resources in the 1st batch

http://www.meta-net.eu 25

Page 26: The CESAR Project:  Challenges and Achievements

Hungarian resources in the 2nd batch

http://www.meta-net.eu 26

Page 27: The CESAR Project:  Challenges and Achievements

Hungarian resources in the 3rd batch

http://www.meta-net.eu 27

Page 28: The CESAR Project:  Challenges and Achievements

NooJ

A linguistic development environment combining fast and robust finite state technology and computational power with ease of use and

Many CESAR partners had already developed a lot of valuable resources

Objective: produce open-source and multi-platform version

Institut Mihajlo Pupin in close collaboration with Max Silberztein, developer of NooJ

First phase: a version in the MONO system

Currently, open source JAVA version in development

http://www.meta-net.eu 28

Page 29: The CESAR Project:  Challenges and Achievements

NooJ – Mono version

http://www.meta-net.eu 29

Page 30: The CESAR Project:  Challenges and Achievements

NooJ – JAVA version

http://www.meta-net.eu 30

Page 31: The CESAR Project:  Challenges and Achievements

Gaps and Challenges*

http://www.cesar-project.net

31

* Presented at LTC’11, 25-27 November, 2011, Poznan

Page 32: The CESAR Project:  Challenges and Achievements

Where does CESAR stand?

http://www.meta-net.eu 32

Page 33: The CESAR Project:  Challenges and Achievements

Results for language resources

below 1.000 in average; below 2.000 in average; equals 0.000 in cells 33

Page 34: The CESAR Project:  Challenges and Achievements

Results for language resources

http://www.meta-net.eu 34

Page 35: The CESAR Project:  Challenges and Achievements

Results for language tools

35below 1.000 in average; below 2.000 in average; equals 0.000 in cells

Page 36: The CESAR Project:  Challenges and Achievements

Results for language tools

http://www.meta-net.eu 36

Page 37: The CESAR Project:  Challenges and Achievements

Conclusions

META-NET excellent opportunity to promote LT in Europe to mobilize all stakeholders around a Strategic Research Agenda to create invaluable stock of resources and tools

CESAR project actively contributing to these aims

CESAR META-SHARE node

Language Whitepaper series is a unique instrument to gain a horizontal perspective of the state of the art in various languages

Hungarian resources and tools are valuable components

There is major work ahead to bridge the technological gap

37http://www.cesar-project.net

Page 38: The CESAR Project:  Challenges and Achievements

Thank you for your attention.

http://www.cesar-project.net

[email protected]

http://www.meta-net.eu

http://www.facebook.com/META.Alliance 38

http://www.cesar-project.net