watson & wmr2017 -...

Watson & WMR2017

(slides mostly derived from Jim Hendler and Simon Ellis,Rensselaer Polytechnic Institute, or from IBM itself)

R. BASILI

A.A. 2016-17

Overview

Motivations

Watson Jeopardy

NLU in Watson

Machine Learning for NLU in Watson

Information Retrieval & Watson◦ Question Answering

◦ Learning to Rank

Conclusions

The needs

Unstructured data processing services for ◦ Information extraction (Entities, Domain Named Entities, Relations)

◦ Classification (over texts, sentences, questions)

◦ Semantic role labeling

◦ Sentiment Analysis

Robust language processing services◦ Morphologica analysis, POS tagging, parsing

Knowledge Engineering:◦ Domain adaptation (Dictionaries, Named Entity catalogues, Lexicons)

◦ Ontology population

◦ Integration of Structured and unstructured data through hybrid linguistic and logical reasoning

NLU in Watson

IBM Watson

Questions as Clues

Watson and Semantic Web

IBM

Inside Watson

Watson pipeline as published by IBM; see IBM J Res & Dev 56 (3/4), May/July 2012, p. 15:2

Watson Simplified (S. Ellis, 2013)

Question Analysis

Question analysis

What is the question asking for?

Which terms in the question refer to the answer?

Given any natural language question, how can Watson accurately discover this information?

Who is the president of

Rensselaer Polytechnic Institute?

Focus Terms:

“Who”, “president of Rensselaer

Polytechnic Institute”

Answer Types: Person, President

Question

Analysis

Context & Language Variability

Parsing and semantic analysis

What information about a previously unseen piece of English text can Watson determine?

How is this information useful?

Natural Language Parsing Semantic Analysis

- tokenization

- grammatical structure

- parts of speech

- relationships between words

- ...etc.

- meanings of words, phrases, etc.

- synonyms,

- entailment

- hypernyms, hyponyms

- ...etc.

Dependency Parsing: the tabular view"Chandelier looks great but nowadays does not usually use these items from which their name is derived"

Question analysis pipeline

Unstructured

Question Text

Parsing

&

Semantic

Analysis

Machine

Learning

Classifiers

Structured Annotations

of Question:

Focus, answer types,

Useful search queries

Search Result Processing and Candidate Generation

Primary Search

Primary Search is used to generate the corpus of information from which to take candidate answers, passages, supporting evidence, and essentially all textual input to the system

It formulates queries based on the results of Question Analysis

These queries are passed into a (cached) search engine which returns a set number of highly relevant documents and their ranks.

◦ on the open Web this could be a regular search engine (our extension)

Candidate Generation

Candidate Generation generates a wide net of possible answers for the question from each document.

Using each document, and the passages created by Search Result Processing, we generate candidates using three techniques:

◦ Title of Document (T.O.D.): Adds the title of the document as a candidate.

◦ Wikipedia Title Candidate Generation: Adds any noun phrases within the document’s passage texts that are also the titles of Wikipedia articles.

◦ Anchor Text Candidate Generation: Adds candidates based on the hyperlinks and metadata within the document.

Search Result Processing andCandidate Generation

Scoring & Ranking

Scoring

Analyzes how well a candidate answer relates to the question

Two basic types of scoring algorithm◦ Context-independent scoring

◦ Context-dependent scoring

Types of scorers

Context-independent◦ Question Analysis

◦ Ontologies (DBpedia, YAGO, etc)

◦ Type hierarchy reasoning

Context-dependent◦ Analyzes feature of the natural language environment where candidates were found

◦ Relies on “passages” found during search

◦ Many special purpose ones used in Jeopardy

Evidences

Scorers

Passage Term Match

Textual Alignment

Skip-Bigram

◦ Each of these scores supportive evidence

◦ These scores are then merged to produce a single candidate score

Learning & Retrieval in Watson

Learning to Read◦ POS tagging

◦ Tokenization

◦ Parsing

◦ Named entity extraction

Question Management◦ Intent recognition

◦ Focus detection

◦ LAT recognition

Unstructured Information Management Architecture in Watson: CAS

Learning & Retrieval in Watson

Query Expansion◦ Lexical Embeddings

◦ Query completion

◦ Weighting

Supporting Evidence Search◦ Context independent Evidence extraction

Learning to Rank◦ Retrieved Candidates

◦ Supporting Evidences

◦ Candidate Answers

Watson APIs: the cognitive architecture

NLU & Discovery

Natural Language Understanding. Analizza il testo per estrarre i metadati dal contenuto come ad esempio concetti, entità, parole chiave, categorie, opinioni, sentimenti, relazioni, regole semantiche, utilizzando NLP (Natural Language Understanding). Mediante modelli di annotazione personalizzati sviluppati utilizzando Watson Knowledge Studio, identifica entità specifiche del comparto industriale e le relazioni nel testo non strutturato.

Funzioni: Estrazione di Concetti, Entità, Parole chiave, Categorie, Opinioni, Sentimenti, Relazioni da testo libero

Discovery. Aggiunge un motore di analitica del contenuto e di ricerca cognitiva alle applicazioni per identificare i pattern, le tendenze e le informazioni utilizzabili che determinano un processo decisionale migliore. Unifica in modo sicuro dati strutturati e non strutturati con contenuto prearricchito ed utilizza un linguaggio query semplificato per eliminare la necessità di filtro manuale dei risultati.

watson & wmr2017 -...

Documents