content analysis and cern roman chyla. artificial intelligence natural language processing web of...

14
Content analysis and CERN Roman Chyla

Upload: aldous-booth

Post on 03-Jan-2016

233 views

Category:

Documents


7 download

TRANSCRIPT

Content analysis and CERN

Roman Chyla

Artificial intelligenceNatural language processing

Web of data

Content analysis

Semantic Web

Information extraction

?

A lot to do…

Semantic dictionary

• Link between infinite and finite domains

• Must be prepared (or at least revised) by humans– Purposeful– Incomplete– Constantly changing

• Very expensive to create/maintain– Solution? Use existing data!

Basic principles

• Keep it simple, stupid (I didn‘t want believe it could work, it was too simple!)

• You can‘t get it 100% right• Dictionary ~ Universal semantic language

– Not really a language, but taxonomy (not even ontology)– Lackss expresiveness– Still very much vague (but that is a feature, not bug!)– Cannot infere from facts

BUT it is:– Simple to maintain– Ready to change and evolve, ready to accomodate other resources – Language independent– Problem of research question– Problem of universal and domain specific taxonomy

Word sense disambiguation

• Homonyms are obvious problem• … and Seman can work with many

definitions at the same time (think of 3 people and their definition of one word)

• Possible solutions:– Disambiguation by harvested definitions– Rules– Neural network (supervised learning)– If problems are few, humans can decide

So what I want to do…

• Prepare another semantic dictionary for HEP (using whatever I can) and for english in general (UDC + existing seman)

• Diferentiate HEP core and non-core• Search corrections (did you mean?)• Search results categorization/facets• Identify entities, data elements… make them available

(this is mainly IE task)• Identification of topics (metrics of similarity between

document and „known characteristics“)• Keywording – identification of statically significant

occurences of concepts (not words)• Come up with faster ways to enrich the taxonomy

• Semantic dictionary

• Did you mean?

• IE engine

• (Bibclassify)

Thank you for your attention.

Questions?