watson & wmr2017 -...
TRANSCRIPT
Watson & WMR2017
(slides mostly derived from Jim Hendler and Simon Ellis,Rensselaer Polytechnic Institute, or from IBM itself)
R. BASILI
A.A. 2016-17
Overview
Motivations
Watson Jeopardy
NLU in Watson
Machine Learning for NLU in Watson
Information Retrieval & Watson◦ Question Answering
◦ Learning to Rank
Conclusions
The needs
Unstructured data processing services for ◦ Information extraction (Entities, Domain Named Entities, Relations)
◦ Classification (over texts, sentences, questions)
◦ Semantic role labeling
◦ Sentiment Analysis
Robust language processing services◦ Morphologica analysis, POS tagging, parsing
Knowledge Engineering:◦ Domain adaptation (Dictionaries, Named Entity catalogues, Lexicons)
◦ Ontology population
◦ Integration of Structured and unstructured data through hybrid linguistic and logical reasoning
NLU in Watson
IBM Watson
Questions as Clues
Watson and Semantic Web
IBM
Inside Watson
Watson pipeline as published by IBM; see IBM J Res & Dev 56 (3/4), May/July 2012, p. 15:2
Watson Simplified (S. Ellis, 2013)
Question Analysis
Question analysis
What is the question asking for?
Which terms in the question refer to the answer?
Given any natural language question, how can Watson accurately discover this information?
Who is the president of
Rensselaer Polytechnic Institute?
Focus Terms:
“Who”, “president of Rensselaer
Polytechnic Institute”
Answer Types: Person, President
Question
Analysis
Context & Language Variability
Parsing and semantic analysis
What information about a previously unseen piece of English text can Watson determine?
How is this information useful?
Natural Language Parsing Semantic Analysis
- tokenization
- grammatical structure
- parts of speech
- relationships between words
- ...etc.
- meanings of words, phrases, etc.
- synonyms,
- entailment
- hypernyms, hyponyms
- ...etc.
Dependency Parsing: the tabular view"Chandelier looks great but nowadays does not usually use these items from which their name is derived"
Question analysis pipeline
Unstructured
Question Text
Parsing
&
Semantic
Analysis
Machine
Learning
Classifiers
Structured Annotations
of Question:
Focus, answer types,
Useful search queries
Search Result Processing and Candidate Generation
Primary Search
Primary Search is used to generate the corpus of information from which to take candidate answers, passages, supporting evidence, and essentially all textual input to the system
It formulates queries based on the results of Question Analysis
These queries are passed into a (cached) search engine which returns a set number of highly relevant documents and their ranks.
◦ on the open Web this could be a regular search engine (our extension)
Candidate Generation
Candidate Generation generates a wide net of possible answers for the question from each document.
Using each document, and the passages created by Search Result Processing, we generate candidates using three techniques:
◦ Title of Document (T.O.D.): Adds the title of the document as a candidate.
◦ Wikipedia Title Candidate Generation: Adds any noun phrases within the document’s passage texts that are also the titles of Wikipedia articles.
◦ Anchor Text Candidate Generation: Adds candidates based on the hyperlinks and metadata within the document.
Search Result Processing andCandidate Generation
Scoring & Ranking
Scoring
Analyzes how well a candidate answer relates to the question
Two basic types of scoring algorithm◦ Context-independent scoring
◦ Context-dependent scoring
Types of scorers
Context-independent◦ Question Analysis
◦ Ontologies (DBpedia, YAGO, etc)
◦ Type hierarchy reasoning
Context-dependent◦ Analyzes feature of the natural language environment where candidates were found
◦ Relies on “passages” found during search
◦ Many special purpose ones used in Jeopardy
Evidences
Evidences
Scorers
Passage Term Match
Textual Alignment
Skip-Bigram
◦ Each of these scores supportive evidence
◦ These scores are then merged to produce a single candidate score
Learning & Retrieval in Watson
Learning to Read◦ POS tagging
◦ Tokenization
◦ Parsing
◦ Named entity extraction
Question Management◦ Intent recognition
◦ Focus detection
◦ LAT recognition
Unstructured Information Management Architecture in Watson: CAS
Learning & Retrieval in Watson
Query Expansion◦ Lexical Embeddings
◦ Query completion
◦ Weighting
Supporting Evidence Search◦ Context independent Evidence extraction
Learning to Rank◦ Retrieved Candidates
◦ Supporting Evidences
◦ Candidate Answers
Watson APIs: the cognitive architecture
NLU & Discovery
Natural Language Understanding. Analizza il testo per estrarre i metadati dal contenuto come ad esempio concetti, entità, parole chiave, categorie, opinioni, sentimenti, relazioni, regole semantiche, utilizzando NLP (Natural Language Understanding). Mediante modelli di annotazione personalizzati sviluppati utilizzando Watson Knowledge Studio, identifica entità specifiche del comparto industriale e le relazioni nel testo non strutturato.
Funzioni: Estrazione di Concetti, Entità, Parole chiave, Categorie, Opinioni, Sentimenti, Relazioni da testo libero
Discovery. Aggiunge un motore di analitica del contenuto e di ricerca cognitiva alle applicazioni per identificare i pattern, le tendenze e le informazioni utilizzabili che determinano un processo decisionale migliore. Unifica in modo sicuro dati strutturati e non strutturati con contenuto prearricchito ed utilizza un linguaggio query semplificato per eliminare la necessità di filtro manuale dei risultati.