watson & wmr2017 -...

45
Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon Ellis, Rensselaer Polytechnic Institute, or from IBM itself) R. BASILI A.A. 2016-17

Upload: others

Post on 28-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Watson & WMR2017

(slides mostly derived from Jim Hendler and Simon Ellis,Rensselaer Polytechnic Institute, or from IBM itself)

R. BASILI

A.A. 2016-17

Page 2: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Overview

Motivations

Watson Jeopardy

NLU in Watson

Machine Learning for NLU in Watson

Information Retrieval & Watson◦ Question Answering

◦ Learning to Rank

Conclusions

Page 3: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 4: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 5: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 6: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 7: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

The needs

Unstructured data processing services for ◦ Information extraction (Entities, Domain Named Entities, Relations)

◦ Classification (over texts, sentences, questions)

◦ Semantic role labeling

◦ Sentiment Analysis

Robust language processing services◦ Morphologica analysis, POS tagging, parsing

Knowledge Engineering:◦ Domain adaptation (Dictionaries, Named Entity catalogues, Lexicons)

◦ Ontology population

◦ Integration of Structured and unstructured data through hybrid linguistic and logical reasoning

Page 8: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

NLU in Watson

Page 9: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

IBM Watson

Page 10: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 11: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Questions as Clues

Page 12: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 13: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Watson and Semantic Web

IBM

Page 14: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 15: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 16: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Inside Watson

Watson pipeline as published by IBM; see IBM J Res & Dev 56 (3/4), May/July 2012, p. 15:2

Page 17: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Watson Simplified (S. Ellis, 2013)

Page 18: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Question Analysis

Page 19: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Question analysis

What is the question asking for?

Which terms in the question refer to the answer?

Given any natural language question, how can Watson accurately discover this information?

Who is the president of

Rensselaer Polytechnic Institute?

Focus Terms:

“Who”, “president of Rensselaer

Polytechnic Institute”

Answer Types: Person, President

Question

Analysis

Page 20: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Context & Language Variability

Page 21: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Parsing and semantic analysis

What information about a previously unseen piece of English text can Watson determine?

How is this information useful?

Natural Language Parsing Semantic Analysis

- tokenization

- grammatical structure

- parts of speech

- relationships between words

- ...etc.

- meanings of words, phrases, etc.

- synonyms,

- entailment

- hypernyms, hyponyms

- ...etc.

Page 22: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Dependency Parsing: the tabular view"Chandelier looks great but nowadays does not usually use these items from which their name is derived"

Page 23: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Question analysis pipeline

Unstructured

Question Text

Parsing

&

Semantic

Analysis

Machine

Learning

Classifiers

Structured Annotations

of Question:

Focus, answer types,

Useful search queries

Page 24: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 25: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Search Result Processing and Candidate Generation

Page 26: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Primary Search

Primary Search is used to generate the corpus of information from which to take candidate answers, passages, supporting evidence, and essentially all textual input to the system

It formulates queries based on the results of Question Analysis

These queries are passed into a (cached) search engine which returns a set number of highly relevant documents and their ranks.

◦ on the open Web this could be a regular search engine (our extension)

Page 27: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Candidate Generation

Candidate Generation generates a wide net of possible answers for the question from each document.

Using each document, and the passages created by Search Result Processing, we generate candidates using three techniques:

◦ Title of Document (T.O.D.): Adds the title of the document as a candidate.

◦ Wikipedia Title Candidate Generation: Adds any noun phrases within the document’s passage texts that are also the titles of Wikipedia articles.

◦ Anchor Text Candidate Generation: Adds candidates based on the hyperlinks and metadata within the document.

Page 28: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Search Result Processing andCandidate Generation

Page 29: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Scoring & Ranking

Page 30: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Scoring

Analyzes how well a candidate answer relates to the question

Two basic types of scoring algorithm◦ Context-independent scoring

◦ Context-dependent scoring

Page 31: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Types of scorers

Context-independent◦ Question Analysis

◦ Ontologies (DBpedia, YAGO, etc)

◦ Type hierarchy reasoning

Context-dependent◦ Analyzes feature of the natural language environment where candidates were found

◦ Relies on “passages” found during search

◦ Many special purpose ones used in Jeopardy

Page 32: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Evidences

Page 33: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Evidences

Page 34: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Scorers

Passage Term Match

Textual Alignment

Skip-Bigram

◦ Each of these scores supportive evidence

◦ These scores are then merged to produce a single candidate score

Page 35: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Learning & Retrieval in Watson

Learning to Read◦ POS tagging

◦ Tokenization

◦ Parsing

◦ Named entity extraction

Question Management◦ Intent recognition

◦ Focus detection

◦ LAT recognition

Page 36: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Unstructured Information Management Architecture in Watson: CAS

Page 37: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Learning & Retrieval in Watson

Query Expansion◦ Lexical Embeddings

◦ Query completion

◦ Weighting

Supporting Evidence Search◦ Context independent Evidence extraction

Learning to Rank◦ Retrieved Candidates

◦ Supporting Evidences

◦ Candidate Answers

Page 38: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

Watson APIs: the cognitive architecture

Page 39: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon

NLU & Discovery

Natural Language Understanding. Analizza il testo per estrarre i metadati dal contenuto come ad esempio concetti, entità, parole chiave, categorie, opinioni, sentimenti, relazioni, regole semantiche, utilizzando NLP (Natural Language Understanding). Mediante modelli di annotazione personalizzati sviluppati utilizzando Watson Knowledge Studio, identifica entità specifiche del comparto industriale e le relazioni nel testo non strutturato.

Funzioni: Estrazione di Concetti, Entità, Parole chiave, Categorie, Opinioni, Sentimenti, Relazioni da testo libero

Discovery. Aggiunge un motore di analitica del contenuto e di ricerca cognitiva alle applicazioni per identificare i pattern, le tendenze e le informazioni utilizzabili che determinano un processo decisionale migliore. Unifica in modo sicuro dati strutturati e non strutturati con contenuto prearricchito ed utilizza un linguaggio query semplificato per eliminare la necessità di filtro manuale dei risultati.

Page 40: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 41: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 42: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 43: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 44: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon
Page 45: Watson & WMR2017 - ai-nlp.info.uniroma2.itai-nlp.info.uniroma2.it/basili/didattica/WmIR_16_17/Watson_WMR2017.pdf · Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon