de conferentie 2012 - clarin

38
CLARIN-NL Reaching out to the users Arjan van Hessen Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands

Upload: stichting-den

Post on 09-Dec-2014

353 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: De conferentie 2012 - CLARIN

CLARIN-NLReaching out to the users

Arjan van Hessen

Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands

Page 2: De conferentie 2012 - CLARIN

State of the Technology

Language and Speech Technology is (nearly) mature Many applications are available Most of it is usable (although not perfect) but…..

Page 3: De conferentie 2012 - CLARIN

Unused Technology & Resources

Many scholars are not aware of the HLT & Resources

A-priori technical knowledge still necessary Use it to much

dependent of “friends” in the field

Lack of standardization is killing

It is less used than expected

Page 4: De conferentie 2012 - CLARIN

Research Life cycle

Cultural Heritage Institution(s)

New Idea

Research

BuildingTuning

Publications

?

Page 5: De conferentie 2012 - CLARIN

Unused Technology & Resources

CAR

Page 6: De conferentie 2012 - CLARIN

HLT & CHI paths

Language processing

Machine learning

Humaninities

CATCHCultural Heritage Institutions

Page 7: De conferentie 2012 - CLARIN

After the project

7

Lack of standardizationBad interfaces

Page 8: De conferentie 2012 - CLARIN

CLARIN-EU (2007-2012)CLARIN-NL (2009-2015)

CLARIN-ERIC (2012-xxxx)CLARIAH (2015-…)

Infrastructure program for the Humanities

8

Page 9: De conferentie 2012 - CLARIN

Issues to address

1. Finding the users

2. Identification of their needs/problems

3. Do our solutions correspond to their problems?

4. Usability of tools: can they use them?

5. Visualisation

6. Tutorials and web material (movies, courses)

7. Sustainability of tools and resources

9

Page 10: De conferentie 2012 - CLARIN

1. FINDING THE USERSHow to identify and convince potential users

10

Page 11: De conferentie 2012 - CLARIN

Humanities enter a New Era

Huge amounts of digital data are becoming available

Traditionally, Spitzweg’s “lonely scholar” no longer

sufficesBig data, supported by

automated methods

Hardware allows this and many tools are available and under

development

11

Page 12: De conferentie 2012 - CLARIN

User Surveys

Go out to ask potential users User survey in the Netherlands (2010)

12

Page 13: De conferentie 2012 - CLARIN

2. IDENTIFICATION OF THEIR NEEDS/PROBLEMS

What do they need?

13

Page 14: De conferentie 2012 - CLARIN

User attraction cycle

14

Finding new users

Convincing these users to

participate

Train these users in the use of all those wonderful tools

Support the users

Listening to the users

Page 15: De conferentie 2012 - CLARIN

3. DO OUR SOLUTIONS CORRESPOND TO THEIR PROBLEMS?

What to prevent in order to NOT scare off (potential) users

15

Page 16: De conferentie 2012 - CLARIN

16

The CLARIN dream

Give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350)

Give me all negative articles about Catholics in the Fryske Courant (1868-1924)

Find European TV news interviews that involve discussions about Geert Wilders

16

Page 17: De conferentie 2012 - CLARIN

17

The CLARIN nightmare in 6 sleepless nights – night 1

Give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350) “All” means from all countries and all archives, not just some

archives in some (9) countries that happen to be in CLARIN If contemporary docs exist in digital form at all they are

probably pictures – how do we get access to the content? Can we rely on standardized metadata to find them? Many of the docs may be in Latin – can we handle that, and

what about the other languages? How would a scholar know how to formulate this query? How to present results?

Page 18: De conferentie 2012 - CLARIN

4. USABILITY OF TOOLSThe gearbox syndrome

18

Page 19: De conferentie 2012 - CLARIN

19

The gearbox syndrome explained

Humanities scholar with a problem, waiting for a solution

First HLT researcher offering help

Page 20: De conferentie 2012 - CLARIN

20

The gearbox syndrome explained

Humanities scholar with a problem, waiting for a solution

First generation named entity recognizer (rule based)

Page 21: De conferentie 2012 - CLARIN

21

The gearbox syndrome explained

Humanities scholar with a problem, waiting for a solution

Second HLT researcher offering help

Page 22: De conferentie 2012 - CLARIN

22

The gearbox syndrome explained

Humanities scholar with a problem, waiting for a solution

Second generation named entity recognizer (statistics based)

Page 23: De conferentie 2012 - CLARIN

23

The gearbox syndrome explained

Humanities scholar with a problem, waiting for a solution

Third HLT researcher offering help

Page 24: De conferentie 2012 - CLARIN

24

The gearbox syndrome explained

Humanities scholar with a problem, waiting for a solution

LREC 2012 paper about next generation named entity recognizer

Page 25: De conferentie 2012 - CLARIN

25

The gearbox syndrome explained

Page 26: De conferentie 2012 - CLARIN

Making understandable interfaces

Page 27: De conferentie 2012 - CLARIN

5. VISUALIZATION

A picture says more than 1000 wordsEasy visualization fosters data analysisNice visualisation eases use of analysis toolsNice-to-look-at tools help to reach out to the community

27

Page 28: De conferentie 2012 - CLARIN

Who answered which words: visualizing word frequency information in letters

28

C. Culy. 2012. "Some challenges of language and linguistic data for information visualization. " Invited keynote presentation at Advanced Visual Methods for Linguistics. University of York, September 7, 2012.

Page 29: De conferentie 2012 - CLARIN

29

Page 30: De conferentie 2012 - CLARIN

30

Page 31: De conferentie 2012 - CLARIN

Parliamentary Debate

31Which party interrupted which other party and how often?

Page 32: De conferentie 2012 - CLARIN

6. TUTORIALS AND WEB MATERIAL

Create and publish web tutorialsPublish recorded lectures about CLARIN-specific topicsMake and publish show cases

32

Page 33: De conferentie 2012 - CLARIN

Web-video’s

33

Page 35: De conferentie 2012 - CLARIN

7. SUSTAINABILITY OF TOOLS AND RESOURCES

Resources and tools must be accessible after a project finishesData and tools must use international accepted standardsEasy access via federated login

35

Page 36: De conferentie 2012 - CLARIN

CLARIN Centres

36

Page 37: De conferentie 2012 - CLARIN

Conclusion

CLARIN offers a good and sustainable infrastructure for long-term use of both Resources and Tools

Participating in CLARIN gives you access to enclosure tools, standardized metadata, tools for metadata, the CLARIN community

Give other groups/institutions access to your data….. If you want

37

Page 38: De conferentie 2012 - CLARIN

THANK YOU!

So join us!

www.clarin.nl

38